Skip to Main Content

Managing and Preserving Research Data: The Data Lifecycle

What is the Research Data Lifecycle?

The data lifecycle model provides a visual depiction of the typical phases that data passes through from creation to preservation. Thinking about data as having an evolving set of curation requirements can help you select the right tools and strategies for each stage. 

You'll likely come across many iterations of the data lifecycle, but the following is a typical representation of the model. (1) 

Research Data Lifecycle

DISCOVERY AND PLANNING

Research projects generally begin with a review of existing literature and the development of a new research question. Methods for data collection are devised.

INITIAL DATA COLLECTION

Once the researcher has a plan and a research question, data collection can begin. Research can involve quantitative, qualitative, or mixed methods. Data typically needs some cleaning and processing before it can be analyzed, and there is a wide range of tools and software programs that can assist with this. 

FINAL DATA PREPARATION AND ANALYSIS 

Data cleanup and processing can involve:

  • Data wrangling, in which a data set is cleaned and transformed from its raw form into something more accessible and usable. This is also known as data cleaning, data munging, or data remediation.
  • Data compression, in which data is transformed into a format that can be more efficiently stored.
  • Data encryption, in which data is translated into another form of code to protect it from privacy concerns.(2)

Data analysis tools include programs like SPSS, Stata, R, and MAXQDA.

PUBLISHING AND SHARING

Journals have varying requirements for the inclusion of datasets in published articles. It's important to remember that if you're the recipient of a federal research grant, you'll have specific requirements for sharing your data and making it publicly accessible. Tools like the DMPTool are available to help create plans for data management and sharing. 

LONG-TERM MANAGEMENT

Long-term data stewardship and curation require the identification of a repository for data storage and ensuring that your data have good descriptive information (codebooks, metadata) to support future reuse. Privacy and access control are important to consider in this stage.

Data repositories include platforms like Dataverse, Figshare, Zenodo, and Dryad, as well as discipline-specific repositories. 

Sources:

(1) Green, Ann G., and Myron P. Gutmann. (2007) “Building Partnerships among Social Science Researchers, Institution-based Repositories, and Domain Specific Data Archives.” OCLC Systems and Services: International Digital Library Perspectives. 23: 35-53. http://doi.org/10.1108/10650750710720757

(2) 8 steps in the Data Lifecycle. Harvard Business School (online): https://online.hbs.edu/blog/post/data-life-cycle