Skip to Main Content

ETD Data Pilot Program

This guide provides submission guidelines and information about the ETD Data Pilot Program.

Preparing Your Data For Submission

Good documentation and metadata along with archiving in a preferred file format, can help ensure continued long term access and re-usability of your data. When submitting data, students should submit the documentation, metadata and data as a compressed file in the supplemental field on their ETD submission form in Digital Commons. Below is a general outline for preparing your data. Each tab provides more details additional details to consider and prepare your data for submission including:

  • Generate your documentation and complete metadata.
  • Ensure data is in an appropriate format.
  • Follow all privacy and ethical standards set by the University and ensure your right to archive the data.

Documentation and Metadata

Good documentation of data can help ensure that data can be understood and interpreted by any user. Documentation should start at the beginning of a project and continue throughout the research.

When submitting data, students are required to include appropriate documentation and metadata in a readme.txt file. The documentation should include:

  • How data was created
  • What the data means
  • The data’s content and structure
  • Any manipulations that may have taken place

Metadata, is a subset of core data documentation. Though metadata standards vary across disciplines all metadata provides standardized structured information explaining the purpose, origin, time references, geographic locations, creator, access conditions and terms of use of data.

Along with the documentation of your project and data you must include a metadata.txt file that includes:

  • Title: The name given to the data/dataset
  • Title of Theses or Dissertation
  • Author: Who created the data?
  • Keywords: What is the data about? Consider using a controlled vocabulary like searchFAST or  LCSH
  • Description: What is the resource about? Description may include but is not limited to: an abstract, table of contents, reference to a graphical representation of content or a free-text account of the content
  • Date: When was the resource created?
  • Format: What is the format of the data (PDF/A, Excel Spreadsheet, etc)
  • Source: Where did the data come from?
  • Language: What language is the data in?
  • Coverage: Coverage will typically include spatial location (a place name or geographic co-ordinates), temporal period (a period label, date, or date range) or jurisdiction (such as a named administrative entity)
  • Rights: Who owns the data?
  • Methodological Information: How was the data collected?

Use the Readme_[Last Name].txt to compile your documentation and metadata for submission.

 

Format

Choosing an appropriate format for the data is also an important aspect of long term access and usability. Below are basic guidelines when preparing data for submission:

  • Non-proprietary
  • Open, documented standards
  • Commonly used by your community/discipline
  • Standard character encoding (ASCII, UTF-8)
  • Unencrypted

Below is a table that outlines acceptable data formats. Data not in these formats will only be accepted if deemed appropriate upon review.

Type of data

Acceptable formats for sharing, reuse and preservation

Other acceptable formats for data preservation

Quantitative tabular data with extensive metadata

a dataset with variable labels, code labels, and defined missing values, in addition to the matrix of data

SPSS portable format (.por)

delimited text and command ('setup') file (SPSS, Stata, SAS, etc.) containing metadata information

some structured text or mark-up file containing metadata information, e.g. DDI XML file

proprietary formats of statistical packages e.g. SPSS (.sav), Stata (.dta)
MS Access (.mdb/.accdb)

Quantitative tabular data with minimal metadata

a matrix of data with or without column headings or variable names, but no other metadata or labelling

comma-separated values (CSV) file (.csv)

tab-delimited file (.tab)

including delimited text of given character set with SQL data definition statements where appropriate

 

delimited text of given character set - only characters not present in the data should be used as delimiters (.txt)

widely-used formats, e.g. MS Excel (.xls/.xlsx), MS Access (.mdb/.accdb), dBase (.dbf) and OpenDocument Spreadsheet (.ods)

Geospatial data

vector and raster data

ESRI Shapefile (essential - .shp, .shx, .dbf, optional - .prj, .sbx, .sbn)

geo-referenced TIFF (.tif, .tfw)

CAD data (.dwg)

tabular GIS attribute data

 

ESRI Geodatabase format (.mdb)

MapInfo Interchange Format (.mif) for vector data

Keyhole Mark-up Language (KML) (.kml)

Adobe Illustrator (.ai), CAD data (.dxf or .svg)

binary formats of GIS and CAD packages

Qualitative data

textual

eXtensible Mark-up Language (XML) text according to an appropriate Document Type Definition (DTD) or schema (.xml)

Rich Text Format (.rtf)

plain text data, ASCII (.txt)

Hypertext Mark-up Language (HTML) (.html)

widely-used proprietary formats, e.g. MS Word (.doc/.docx)

some proprietary/software-specific formats, e.g. NUD*IST, NVivo and ATLAS.ti

 

Digital image data

TIFF version 6 uncompressed (.tif)

JPEG (.jpeg, .jpg) but only if created in this format

TIFF (other versions) (.tif, .tiff)

Adobe Portable Document Format (PDF/A, PDF) (.pdf)

standard applicable RAW image format (.raw)

Photoshop files (.psd)

Digital audio data

Free Lossless Audio Codec (FLAC) (.flac)

MPEG-1 Audio Layer 3 (.mp3) but only if created in this format

Audio Interchange File Format (AIFF) (.aif)

Waveform Audio Format (WAV) (.wav)

Digital video data

MPEG-4 (.mp4)

motion JPEG 2000 (.mj2)

 

 

Documentation and scripts

Rich Text Format (.rtf)
PDF/A or PDF (.pdf)
HTML (.htm)
OpenDocument Text (.odt)

plain text (.txt)

some widely-used proprietary formats, e.g. MS Word (.doc/.docx) or MS Excel (.xls/.xlsx)

XML marked-up text (.xml) according to an appropriate DTD or schema, e.g. XHMTL 1.0

Source: UK Data Archive, http://www.data-archive.ac.uk/create-manage/format/formats-table

 

File Organization

1. Use a Naming Convention

Using a naming convention for your files will assist you and other researchers in using your data. If there are established conventions for your research group and/or discipline you should use them. If there aren't any file naming conventions already being used by your research group/discipline, you can create your own. These are some best practices when creating a naming convention:

  • Use file naming consistently
  • Make sure the names clearly represent what the file is
  • Use short informative words or phrases and try to keep file names under 25 characters
  • Avoid using these symbols "/\:*<>[]&$
  • Use underscores (_) not spaces to separate terms

2. Group your files into meaningful datasets.

There are three widely used file structures for organizing your data:

  • Hierarchical: Items organized in folders and subfolders
  • Tag-based: each item assigned one or more tags
  • Hybrid: A mix of Hierarchical and tag-based

Researchers working with human subjects must take additional steps to adhere to local and federal regulations in order to ensure the privacy of those individuals participating in the research projects. This applies not only to medical and health research, but also applies to research projects that include polls, surveys, and focus groups.

The FIU Institutional Review Board (IRB) is a committee established under federal regulations for the protection of human subjects in research (45 CFR 46). Its purpose is to help protect the rights and welfare of human participants in research. FIU faculty, staff, and students are required to obtain IRB approval prior to conducting research with human subjects. This applies to both on-campus and off-campus research, regardless of funding.

Students submitting data obtained from human subjects must follow all privacy and ethical standards set forth by the University, State and Federal governing bodies. When submitting your ETD and Data, you will be required to attest that you have complied with all privacy regulations related to human subjects in your study.

Access/Permission Checklist

  • Do you have permission from your research advisor to share this data?
  • Does your data include any private information, medical information, or other information with possible confidentiality concerns?
  • Does your research project have sufficient permissions necessary to disseminate the project data.
  • Did any of the data come from a third party source?
  • If so, did the project obtain permission to disseminate?
  • Are there any restrictions you need to include in your permission statement?

Submitting your data alongside your ETD

All graduate students submitting a thesis or dissertation through the graduate school are eligible to submit and archive their finalized data sets alongside their ETD. All data sets will be made openly accessible through the Library's dPanther system and Digital Commons.

Students submitting their data should review the section "preparing your data for submission" which outline documentation and metadata, format and copyright information.

All data should be submitted as a compressed file in the supplemental filed on their ETD submission in Digital Commons. The compressed file should include:

  1. Dataset (s)
  2. A completed Readme_[LastName].txt.

Review the criteria below to determine if your data is appropriate and ready for submission.

 

Is your data right for submission?

  • Is this data related to your Thesis or Dissertation?
  • Is this data your original data?
  • Can this data be made publicly accessible through the IR?

Data Documentation:

  • Do you have documentation for your data? If not, use the Readme_[LastName].txt to create appropriate metadata and documentation.
  • Is your data file naming convention clear and consistent to an outside user. Is your data grouped into datasets that make sense?

Sharing and Permissions:

  • Have all collaborators, advisors, or other interested parties agreed to sharing the data publicly through the IR?
  • Is the data anonymized to protect any personally identifiable information?

 

You can submit your ETD alongside your ETD in Digital Commons.

1. Follow the submission instructions for your ETD.

2. Before pressing the "Submit" Button, Check the box under "Additional Files"

 

3. Upload your compressed file. The compressed file should include the Readme.txt along with your data. The compressed file's name will be the title that appears. Be sure that the "show" box is unchecked. Select "Continue" to complete your submission.