Supporting_Scientific_Data

Good Enough Practices in Data Management

Best practices in data management are an ideal to be strived for, but are not necessarily attainable in every context. Here are a few situations in which a researcher may not be able to implement what an expert in data management may consider the gold standard.

An individual researcher joins a team that is still developing their own standard operating procedures. A research team works in an area in which community standards and best practices have been widely adopted. A research team does not have access to resources or tools (e.g. infrastructure) needed to implement best practices. There is not sufficient buy-in from data stakeholders (institutions, funders, etc), so developing a culture in which data management is emphasized and implemented in a routine and standardized fashion is challenging.

The following are not an explicit set of requirements or standards. But any effort to change practice across a large and heterogeneous population of scientists is slow and incremental. In these situations, it can be helpful to have a starting point.


First Steps

Develop a document that describes the specific practices you will implement related to saving, organizing, and describing data and other project-related materials (i.e. a data management plan).

  1. Write out a description of your data (e.g. the files that will be included, additional materials that are needed to make use of datasets).
  2. Understand special considerations that apply to your data (e.g. if it contains sensitive information, including protected health information).
  3. Identify local expertise and sources of guidance that can help with data-related questions.

Data Documentation and Description

Make it easy for existing and new collaborators to understand and build upon your data and workflow by developing good documentation.

  1. Record all the steps used to collect, process, and analyze data (i.e. in a lab notebook, protocol).
  2. Record the functional details of any research-related software and code, including version number, dependencies, etc.
  3. Record the structure and contents of data-related files (i.e. in data dictionaries, codebooks, etc)

Data Storage and Organization

Organize project related files (including datasets) in a way that makes it easy to find materials as needed.

  1. Save both raw and intermediate forms of your data.
  2. Backup data frequently, in multiple locations.
  3. Give files names that reflect their contents and/or function.

Data Preservation and Sharing

Ensure your data will be (re)usable over the long term.

  1. Create a “final” dataset that underlies research findings and is accompanied by any documentation/materials needed for it to be (re)used.
  2. Move files for internal use from a working storage platform (e.g. Google Drive) to an option suitable for long term preservation.
  3. Deposit final datasets (including any related files) into specialized or generalized data repositories that assign persistent identifiers