Skip to Main Content

Scholarly Communication Support: Planning, Conducting, Disseminating, Promoting, & Assessing Research

This guide will acquaint researchers with knowledge and tools to assist in planning, conducting, disseminating, promoting, and assessing research.

Collecting & Managing Your Research Data

The Data Management Association (DAMA) defines data management as "the development of architectures, policies, practices, and procedures to manage the data lifecycle."

In other words: Data management is "the process of collecting, keeping, and using data in a cost-effective, secure, and efficient manner" (Simplilearn).

High-quality data management in your research will improve: efficiency, data quality, security, accessibility, compliance and governance, disaster recovery, decision-making, and other aspects.

Research Data Management (RDM) processes include:

  • Creating a data management plan before beginning data collection
  • Organizing your data logically, including directory structure and folder naming, file naming, file versioning, and file formats
  • Understanding copyright and licensing aspects of your data
  • Collecting documentation, throughout your research, about your files and contents; this should include metadata/README files and might also include lab notebooks
  • Ethically protecting sensitive data in terms of how it is collected, where it is stored, anonymization, and whether/how it is shared
  • Storing and backing up your data according to best practices, including considerations of physical security (e.g., locked lab, locked file cabinet) and digital security (e.g., anti-virus program, password protection, encryption)
  • Preserving and/or sharing data after research is complete

The Washington State University Libraries have created a thorough guide on RDM in pre-research, research, and post-research stages:

 


 

 


More Resources

Tools

The following are a few tools that may be useful in organizing and managing your data.

 

Mindmap with the central bubble labeled 6 Ideas to Get Started with Data Equity; the six branching nodes are Build Data Values, Data Collection Assessment, Parternships for Data Equity, Team Trainings on Data Equity, Build Diverse Data Teams, and Community-Centric Data Collection; each of those nodes then has two to three branching nodes with details on that strategy

Plain-text of infographic above:

6 Strategies to Get Started with Data Equity

  • Build Data Values
    • Org-wide understanding of data practices
    • Include this step as part of strategic plan, so it lives for an extended time
  • Data Collection Assessment
    • Bring all data collection sources
    • Assess if the intention of data collection aligns with strategic and DEI plans
  • Partnerships for Data Equity
    • Collaborate with other organizations
    • Share best practices and resources
  • Team Trainings on Data Equity
    • Build a data culture
    • Include internal and external experts
    • Build this as a continuous learning mechanism
  • Build Diverse Data Teams
    • Include all forms of diversity
    • Build cultural sensitivity in data teams around collection, analysis, and interpretation
  • Community-Centric Data Collection
    • Include the community
    • Accessible data collection tools
    • Humanized design and language

Source: @namaste data

In order to support open science and increase access to and reuse of data, proposed best practices emphasize that research data should be FAIR: Findable, Accessible, Interoperable, Reusable

The following resources will help you better understand what FAIR means and how to achieve it.

Graphic reading reads be fair and care; combines the elements of the FAIR and CARE acronyms for data ethics

The Global Indigenous Data Alliance (GIDA) observes that "the current movement toward open data and open science does not fully engage with Indigenous Peoples rights and interests," and they assert "the right to create value from Indigenous data in ways that are grounded in Indigenous worldviews and realise opportunities within the knowledge economy."

The CARE principles ask "researchers and those who manage or control research infrastructures to examine the data lifecycle from a people and purpose orientation. ...These principles complement the existing FAIR principles encouraging open and other data movements to consider both people and purpose in their advocacy and pursuits."

The full and summary CARE principles documents on the GIDA website provide significantly more detail, but in summary:

  • Collective Benefit: Data ecosystems shall be designed and function in ways that enable Indigenous Peoples to derive benefit from the data.
  • Authority to Control: Indigenous Peoples’ rights and interests in Indigenous data must be recognised and their authority to control such data be empowered. Indigenous data governance enables Indigenous Peoples and governing bodies to determine how Indigenous Peoples, as well as Indigenous lands, territories, resources, knowledges and geographical indicators, are represented and identified within data.
  • Responsibility: Those working with Indigenous data have a responsibility to share how those data are used to support Indigenous Peoples’ self-determination and collective benefit. Accountability requires meaningful and openly available evidence of these efforts and the benefits accruing to Indigenous Peoples.
  • Ethics: Indigenous Peoples’ rights and wellbeing should be the primary concern at all stages of the data life cycle and across the data ecosystem.

Data wrangling is the process that transforms raw data into usable data -- cleaning, merging, adapting, and otherwise preparing for analysis!

 

Some Data Wrangling Tools:

Cleaning your data involves taking steps to ensure that the compiled data points are complete, consistent, and correct. Data should conform to all rules in your data dictionary. In many cases, "clean" will also indicate that the data has been de-identified.

In short, "clean" means you have critically examined all the data as it was entered by human or machine, and you have verified that it is ready to be analyzed and to produce valid results.

The link below, Part 1 of a 3-part tutorial, provides a more detailed discussion what "clean" data entails and what aspects to think about.


In order to clean our data effectively and efficiently, we should establish a basic workflow that we can follow, rather than approaching the problem haphazardly. Using reproducible methods as much as possible--for example, using code, and creating robust documentation and change logs.

The link below, Part 2 of a 3-part tutorial, suggests workflow steps and documentation to consider.


Finally, the link below -- Part 3 of the 3-part tutorial -- walks through a real-world example to illustrate how to follow a data cleaning workflow.


 

 

Newton Gresham Library | (936) 294-1614 | (866) NGL-INFO | Ask a Question | Share a Suggestion

Sam Houston State University | Huntsville, Texas 77341 | (936) 294-1111 | (866) BEARKAT
© Copyright Sam Houston State University | All rights reserved. | A Member of The Texas State University System