Skip to Main Content
ZSR Library

Managing and Preserving Research Data: The Data Lifecycle

What is the Research Data Lifecycle?

The data lifecycle model provides a visual depiction of the typical phases that data passes through from creation to preservation. Thinking about data as having an evolving set of curation requirements can help you select the right tools and strategies for each stage. 

You'll likely come across many iterations of the data lifecycle, but the following is a typical representation of the model. (1) 

Research Data Lifecycle


Research projects generally begin with a review of existing literature and the development of a new research question. Methods for data collection are devised.


Once the researcher has a plan and a research question, data collection can begin. Research can involve quantitative, qualitative, or mixed methods. Data typically needs some cleaning and processing before it can be analyzed, and there is a wide range of tools and software programs that can assist with this. 


Data cleanup and processing can involve:

  • Data wrangling, in which a data set is cleaned and transformed from its raw form into something more accessible and usable. This is also known as data cleaning, data munging, or data remediation.
  • Data compression, in which data is transformed into a format that can be more efficiently stored.
  • Data encryption, in which data is translated into another form of code to protect it from privacy concerns.(2)

Data analysis tools include programs like SPSS, Stata, R, and MAXQDA.


Journals have varying requirements for the inclusion of datasets in published articles. It's important to remember that if you're the recipient of a federal research grant, you'll have specific requirements for sharing your data and making it publicly accessible. Tools like the DMPTool are available to help create plans for data management and sharing. 


Long-term data stewardship and curation require the identification of a repository for data storage and ensuring that your data have good descriptive information (codebooks, metadata) to support future reuse. Privacy and access control are important to consider in this stage.

Data repositories include platforms like Dataverse, Figshare, Zenodo, and Dryad, as well as discipline-specific repositories. 


(1) Green, Ann G., and Myron P. Gutmann. (2007) “Building Partnerships among Social Science Researchers, Institution-based Repositories, and Domain Specific Data Archives.” OCLC Systems and Services: International Digital Library Perspectives. 23: 35-53.

(2) 8 steps in the Data Lifecycle. Harvard Business School (online): 

Need help? Chat with us