Supporting_Scientific_Data

Before you get started in earnest…

This guide has its roots in an Introduction to Research Data Management workshop that is regularly taught at Stanford Medicine. That workshop is designed to be standalone, briskly paced, and largely didactic. While such a workshop can cover certain concepts and terms, managing scientific data effectively is not a process that can be covered or implemented just once.

This guide aims to be descriptive - outlining major concepts and practices related to data management in a way that is digestible to biomedical researchers. More prescriptive recommendations and requirements largely come from three sources: the institution where the research is taking place, organizations that provide research funding, and the research community itself. Policies and requirements from these entities will be discussed when relevant, but these requirements should not be an end to considerations related to data management, they should be a beginning.

Data management should be an integral part of the day-to-day work of conducting research.

This guide also aims to provide information that is actionable, the following sections include a variety of exercises, thought experiments, tools, and links to additional resources to further develop data management and sharing-related practices. Over and over, these chapters will return to a concept creatively named Good Data Management Practice, but will remain largely tool agnostic. Though there may be mention of specific data-related tools and platforms, the focus will be on behaviors.

Spoiler alert: There are more than a few worksheets and checklists ahead.

In reading, keep in mind that data management is not an all or nothing endeavor. Like many of the practices and concepts in biomedical research, strategies and practices related to data management exist along a continuum. One difficulty many researchers face when managing data is that there are always new tools, new requirements and expectations, and new norms.

For these reasons, this guide is intended to be a living document, updated over time and in response to feedback. But, because we are all only human, it cannot possibly cover every strategy and practice related to managing the multifarious types of data involved in biomedical research.

Finally a few notes on terminology.

  1. These modules follow a very inclusive definition of the term researcher. Professors, postdocs, and graduate students working on research projects are researchers, so are the lab techs, research assistants, staff, and other professionals who work alongside them.
  2. For somewhat related reasons, the term data is typically used as a singular mass noun (“data is”) rather than as a plural of datum (“data are”).

This guide is built upon a broad and inclusive definition of what “counts” as both a researcher and as research data. Isolating an individual “datum” is often quite difficult. Making use of even a singular measurement from an instrument requires knowing some additional information about the processes and parameters used in its collection. For our purposes, a singular point of data is not typically a useful level of description. Neither is the spreadsheet that includes that point of data. Everything needed to reconstruct the workflow precedes a research finding “counts” as data.

Sorry, not sorry.

Have fun!?
John