Supporting_Scientific_Data

What the heck is this?

The Supporting Scientific Data Guide is a set of informational chapters and resources developed to help scientific researchers with research data management. If there’s anything like a formal statement of purpose to the recipes, Taylor Swift quotes, and file naming tips that lay ahead it would be “To facilitate scientific research through the advancement of data quality, usability, and provenance.” But that’s a bit too much jargon so early on. If research data is what supports research results, this guide is about supprting the data. This guide is about building and maintaining strong foundations on which to build research programs.

Where did this all come from?

The road to developing this guide was circuitous. It began, as many stories of data management do, with frustration and heartbreak. I only began to understand the importance of good data management when I was confronted with less than optimal data management. When I, as a PhD student, struggled to replicate processes done by my lab members (including myself) because of incomplete documentation. When I, on the verge of defending my dissertation, found myself promising a computer repair tech that they would be the first name in my acknowledgements section if they could recover the contents of my hard drive (they couldn’t, I didn’t).

Since then, as a librarian and administrator, I’ve taught, consulted, and collaborated with researchers to help them avoid these kinds of problems. The material in this guide comes from these experiences, but two efforts deserve special mention. This guide owes a great deal to the Support Your Data project, developed with an amazing set of collaborators at the California Digital Library. Yes, the name is barely different. No, I cannot think of another.

This guide also draws substantially from an Introduction to Research Data Management workshop that is regularly taught at Stanford School of Medicine. A huge number of students, faculty, librarians, and research staff members have provided feedback on this workshop over the years which has improved it greatly. In addition to the content, the format of this workshop also served as a motivation for this guide. Like most instruction related to research data management, it is standalone, briskly paced, and largely didactic. While such a workshop can cover certain concepts and terms, it is simply not possible to equip researchers with all of the tools they need to manage data effectively in a single 90 minute session.

Before you get started in earnest…

This guide aims to be descriptive - outlining major concepts and practices related to data management in a way that is digestible to biomedical researchers. More prescriptive recommendations and requirements largely come from three sources: the institution where the research is taking place, organizations that provide research funding, and the research community itself. Policies and requirements from these entities will be discussed when relevant, but these requirements should not be an end to considerations related to data management, they should be a beginning.

This guide also aims to provide information that is actionable, the following sections include a variety of exercises, thought experiments, tools, and links to additional resources to further develop data management and sharing-related practices. Over and over, these chapters will return to a concept creatively named Good Data Management Practice, but will remain largely tool agnostic. Though there may be mention of specific data-related tools and platforms, the focus will be on behaviors.

Spoiler alert: There are more than a few worksheets and checklists ahead.

In reading, keep in mind that data management is not an all or nothing endeavor. Like many of the practices and concepts in biomedical research, strategies and practices related to data management exist along a continuum. One difficulty many researchers face when managing data is that there are always new tools, new requirements and expectations, and new norms.

For these reasons, this guide is intended to be a living document, updated over time and in response to feedback. But, because we are all only human, it cannot possibly cover every strategy and practice related to managing the multifarious types of data involved in biomedical research.

Finally a few notes on terminology.

These modules follow a very inclusive definition of the term researcher. Professors, postdocs, and graduate students working on research projects are researchers, so are the lab techs, research assistants, staff, and other professionals who work alongside them. For somewhat related reasons, the term data is typically used as a singular mass noun (“data is”) rather than as a plural of datum (“data are”).

This guide is built upon a broad and inclusive definition of what “counts” as both a researcher and as research data. Isolating an individual “datum” is often quite difficult. Making use of even a singular measurement from an instrument requires knowing some additional information about the processes and parameters used in its collection. For our purposes, a singular point of data is not typically a useful level of description. Neither is the spreadsheet that includes that point of data. Everything needed to reconstruct the workflow precedes a research finding “counts” as data.

Sorry, not sorry.

Have fun!?
John