The Data Management Association (DAMA) defines data management as "the development of architectures, policies, practices, and procedures to manage the data lifecycle."
In other words: Data management is "the process of collecting, keeping, and using data in a cost-effective, secure, and efficient manner" (Simplilearn).
High-quality data management in your research will improve: efficiency, data quality, security, accessibility, compliance and governance, disaster recovery, decision-making, and other aspects.
Research Data Management (RDM) processes include:
The Washington State University Libraries have created a thorough guide on RDM in pre-research, research, and post-research stages:
More Resources
Tools
The following are a few tools that may be useful in organizing and managing your data.
Plain-text of infographic above:
6 Strategies to Get Started with Data Equity
Source: @namaste data
Image source: Institute of Mathematical Statistics, 2019
In order to support open science and increase access to and reuse of data, proposed best practices emphasize that research data should be FAIR: Findable, Accessible, Interoperable, Reusable
The following resources will help you better understand what FAIR means and how to achieve it.
The Global Indigenous Data Alliance (GIDA) observes that "the current movement toward open data and open science does not fully engage with Indigenous Peoples rights and interests," and they assert "the right to create value from Indigenous data in ways that are grounded in Indigenous worldviews and realise opportunities within the knowledge economy."
The CARE principles ask "researchers and those who manage or control research infrastructures to examine the data lifecycle from a people and purpose orientation. ...These principles complement the existing FAIR principles encouraging open and other data movements to consider both people and purpose in their advocacy and pursuits."
The full and summary CARE principles documents on the GIDA website provide significantly more detail, but in summary:
Data wrangling is the process that transforms raw data into usable data -- cleaning, merging, adapting, and otherwise preparing for analysis!
Some Data Wrangling Tools:
Cleaning your data involves taking steps to ensure that the compiled data points are complete, consistent, and correct. Data should conform to all rules in your data dictionary. In many cases, "clean" will also indicate that the data has been de-identified.
In short, "clean" means you have critically examined all the data as it was entered by human or machine, and you have verified that it is ready to be analyzed and to produce valid results.
The link below, Part 1 of a 3-part tutorial, provides a more detailed discussion what "clean" data entails and what aspects to think about.
In order to clean our data effectively and efficiently, we should establish a basic workflow that we can follow, rather than approaching the problem haphazardly. Using reproducible methods as much as possible--for example, using code, and creating robust documentation and change logs.
The link below, Part 2 of a 3-part tutorial, suggests workflow steps and documentation to consider.
Finally, the link below -- Part 3 of the 3-part tutorial -- walks through a real-world example to illustrate how to follow a data cleaning workflow.
Newton Gresham Library | (936) 294-1614 | (866) NGL-INFO | Ask a Question | Share a Suggestion
Sam Houston State University | Huntsville, Texas 77341 | (936) 294-1111 | (866) BEARKAT
© Copyright Sam Houston State University | All rights reserved. | A Member of The Texas State University System