Skip to Main Content

ETD Data Pilot Program

This guide provides submission guidelines and information about the ETD Data Pilot Program.

Glossary of Terms

 

Term Definition
anonymized data Data about individuals that does not reveal the identity of any of the individuals, and cannot be linked to other data that would reveal the identity of individuals.
archive The transfer of material to a facility that appraises, preserves, and provides access to that material on a long-term or permanent basis.
code Computer code or scripts. In the context of data management, this may include code used in the collection, manipulation, processing, analysis or visualization of data, but may also include software developed for other purposes.
confidentiality The right of privacy and of non-release of disclosed personal information. Applies to data collected on human subjects. Researchers may be subject to legal requirements to prevent the release of private, personally identifiable information provided by research subjects.
controlled vocabulary An established list of standardized terminology for use in indexing and retrieval of information.
copyright

A set of legal rights extended to copyright owners (the author or creator, or other party to whom the rights have been assigned) that govern such activities as reproducing, distributing, adapting, or exhibiting original works fixed in tangible form. Copyright does not apply to factual information; as a result it does not apply to data.

data Data subject to data management planning requirements may be defined differently by different funders, programs, or research communities. In the absence of specific guidance, consider the Office of Management and Budget's defintion of data: "the recorded factual material commonly accepted in the scientific community as necessary to validate research findings" (OMB Circular A-110). In developing a data management plan, researchers should consider which data would be required to verify their results and which data would have the highest potential and value for reuse by others.
derivative (of data) Any data, publication, illustration or visualization, or other work that rearranges, presents, or otherwise makes use of an existing data set.
file format The particular structure used for encoding data in a computer file. File formats are usually identified by the file extension (e.g. .xlsx, .csv, .dbf). File formats may be proprietary or open (with a readily available specification or description of the format). Open file formats usually maximize the potential for reuse and longevity.
identifier A unique and long-lasting reference that allows for continued access to a digital object. Examples of persistent identifier systems include Digital Object Identifiers (DOIs), handles, and Archival Resources Keys (ARKs). Persistent identifiers support interoperability and the reliable citation of digital content.
institutional repository A service storing and providing online access to digital content. Content is typically produced by the institution that hosts the service.
intellectual property Rights applied to creative works, including (but not limited to) copyright, patents, trademarks, and trade secrets.
license In the context of data management, a legal instrument that expresses the terms of use of a data set.
metadata Documentation or information about a data set. It may be embedded in the data itself, or exist separately from the data. Metadata may describe the ownership, purpose, methods, organization, and conditions for use of data, technical information about the data, and other information. Many metadata standards exist across a broad range of disciplines and applications.
non-proprietary Conforming to standards that are in the public domain or are widely licensed, and so not restricted to one manufacture.
open access Typically used to describe publications, open access refers to online, freely available material that has few or no copyright or licensing restrictions. (Suber, 2004)
preservation (of data) Ensuring that data remain intact, accessible and understandable over time. This requires preserving the integrity of digital files themselves, and can be considerably more complicated. Preservation operations may include preserving the software required to interact with the data or emulating older systems, migrating data to new formats and new media, and ensuring there is sufficient metadata to understand, interpret, manage and preserve the data.
privacy The protection of personal information from unauthorized access by others.
public access policy Public Access policies ensure that the results of research are freely available to the public. This term is generally used by funders to policies that align with the objectives of the OSTP memo, "Increasing Access to the Results of Federally Funded Scientific Research."
restricted data or restricted access data Data which are made available under stringent, secure conditions. Typically confidential or sensitive data.
security Methods of protecting data from unauthorized access, modification, or destruction.
standards Accepted methods or models of practice; these may be formally approved (as in NISO standards), or de facto standards. In the context of data management, standards typically apply to data or file formats, and to metadata.
usage statement An expression of the conditions under which a data set may be used. May be formal, as in a license or contract, or an informal expression of the preferences of the data owner(s).

More comprehensive lists of terms are available:

 

This table was adapted from the Research Data Management Service Group at Cornell University