By John Borghi, PhD | ![]() |
This guide consists of a growing list of chapters, each covering a major topic in data management. These are complemented by a variety of explainers, checklists, and templates. In theory, these materials can all be read independently or in any order.
This guide is generally tool agnostic. Several sentences may be devoted to a certain popular spreadsheet program, but otherwise the focus is largely on practices and strategies.
When complete, this guide will consist of approximately 10 chapters. Below are those that are complete enough for public dissemination. There are also links to notes from the author, supplementary materials, and a glossary.
For the moment, even publicly available versions of all this are quite drafty. So please don’t hesitate to reach out with comments, suggestions, questions, and recriminations.
Chapter | Description |
---|---|
Understanding data management | This chapter defines data management in both technical and functional terms. |
Defining research data | Data is more than just an individual file or set of measurements. This chapter details how to understand all of the components of data as situated within a research workflow. |
Planning for data management | This chapter deals with the development of documentation describing how data is to be managed over the course of a project. This includes data management and sharing plans (DMSPs) as well as documentation designed for internal use, such as standard operating procedures. |
Documentation and description | If it is not documented, it did not happen. This chapter covers strategies and processes related to developing protocols, recording research workflow, and documenting the contents of specific data files. |
Data storage and organization | This section details day-to-day strategies and processes related to ensuring that data and other research materials (documentation, code, physical samples, etc) can be found and used as needed. |
Data sharing | This section covers how to share research data in an appropriate form within and beyond a research team. |
Monitoring and auditing | Data management is an iterative and continuous process. This chapter covers methods for ensuring that everyone involved is staying on track and ahead of evolving expectations and requirements. |
Sensitive data | Be careful. This section deals with the special considerations that must be taken while working with certain types of data, including data that contains protected health information (PHI) and data that comes with restrictions on how and by whom it can be used. |
Software and code | Code is not data but code is (kind of) like data. This section deals with the management of code, scripts, and other computational tools. (Coming soon!) |
“Good Enough” | This is not a guide on how to conduct research, but data management is foundational to supporting research findings. This section bridges the gap. (Coming soon!) |
Notes from the Author | Glossary | Supplementary Materials |
Each of the above chapters is built around a specific principle of Good Data Management Practice. See the links below for a full rundown.
Good Data Management Practice | “Good Enough” Practices in Data Management |
Cite this project (and view archived releases):
All chapters are licensed under a Creative Commons Attribution-NonCommercial 4.0 Generic (CC BY-NC 4.0) License.
If you’d like, you can view this project on Github.
Despite the ubiquity of em-dashes, no generative AI was used in the writing of this guide. If the writing seems robotic, that’s due to the (definitely human, totally not a robot) author. beep boop.