Research Data Management Guide

This guide is intended to assist you in topics of planning, collecting, cleaning, preserving, and and sharing research data.

What is Data Cleaning?

Cleaning your data involves taking steps to ensure that the compiled data points are complete, consistent, and correct. Data should conform to all rules in your data dictionary. In many cases, "clean" will also indicate that the data has been de-identified.

In short, "clean" means you have critically examined all the data as it was entered by human or machine, and you have verified that it is ready to be analyzed and to produce valid results.

The link below, Part 1 of a 3-part tutorial, provides a more detailed discussion what "clean" data entails and what aspects to think about.

Workflow for Data Cleaning

In order to clean our data effectively and efficiently, we should establish a basic workflow that we can follow, rather than approaching the problem haphazardly. Using reproducible methods as much as possible--for example, using code, and creating robust documentation and change logs.

The link below, Part 2 of a 3-part tutorial, suggests workflow steps and documentation to consider.


Finally, the link below -- Part 3 of the 3-part tutorial -- walks through a real-world example to illustrate how to follow a data cleaning workflow.

Data Cleaning Tools


Newton Gresham Library | (936) 294-1614 | (866) NGL-INFO | Ask a Question | Share a Suggestion

Sam Houston State University | Huntsville, Texas 77341 | (936) 294-1111 | (866) BEARKAT
© Copyright Sam Houston State University | All rights reserved. | A Member of The Texas State University System