This guide provides resources related to data science.

R is increasingly popular in data science research and is often listed in job postings as a requirement or preference along with Python. So, for those interested in learning a little R these resources are here to help.

__Download R__

The first step is to **download R.** It is also recommended to **download RStudio**, the integrated development environment (IDE) for R. Both R and RStudio are free.

- R for Reproducible Scientific Analysis (Software Carpentry)This set of lessons from Software Carpentry is an introduction to R for people with no programming background. It introduces R, the RStudio interface, working with data structures, organizing/subsetting data, making plots, and creating reports. This is a great "get up to speed quickly" set of lessons that use the same data throughout.
- Programming with R (Software Carpentry)Also from Software Carpentry, this set of lessons is more focused on programming basics and best practices (functions, loops, conditionals, etc.).
- Introduction to RThis is an official R manual. If you want to learn R technicalities top to bottom, this is the right place.
- RStudio's online learning guideRStudio provides an extensive set of links to learning resources. From getting started with R, to making interactive plots with Shiny, to R code best practices, this page is definitely worth a look.

- R for Data Science by Hadley Wickham; Mine Çetinkaya-Rundel; Garrett GrolemundISBN: 9781492097402Publication Date: 2023-07-18This book introduces you to R, RStudio, and the tidyverse, a collection of R packages designed to work together to make data science fast, fluent, and fun. Suitable for readers with no previous programming experience.

We have quite a few R books available through the library. Some of these are only accessible via a physical book copy, but many are available as e-books.

In the meantime, these books may be useful.

- Data Science in R by Thomas MailundISBN: 9781484226704Publication Date: 2017-03-13Beginning Data Science in R details how data science is a combination of statistics, computational science, and machine learning. You'll see how to efficiently structure and mine data to extract useful patterns and build mathematical models. This book teaches you techniques for both data manipulation and visualization and shows you the best way for developing new software packages for R.
- Advanced R by Hadley WickhamISBN: 9781466586970Publication Date: 2014-09-25An Essential Reference for Intermediate and Advanced R Programmers. This book not only helps current R users become R programmers but also shows existing programmers what's special about R. Intermediate R programmers can dive deeper into R and learn new strategies for solving diverse problems while programmers from other languages can learn the details of R and understand why R works the way it does.
- Tidy Modeling with R by Max Kuhn; Julia SilgeISBN: 9781492096481Publication Date: 2022-08-16This practical introduction shows data analysts, business analysts, and data scientists how the tidymodels framework offers a consistent, flexible approach for your work. Learn the steps necessary to build a model from beginning to end Understand how to use different modeling and feature engineering approaches fluently Examine the options for avoiding common pitfalls of modeling, such as overfitting Learn practical methods to prepare your data for modeling Tune models for optimal performance Use good statistical practices to compare, evaluate, and choose among models
- Text Mining with R by Julia Silge; David RobinsonISBN: 9781491981658Publication Date: 2017-07-18With this practical book, you'll explore text-mining techniques with tidytext, a package that authors Julia Silge and David Robinson developed using the tidy principles behind R packages like ggraph and dplyr. The authors demonstrate how treating text as data frames enables you to manipulate, summarize, and visualize characteristics of text. You'll also learn how to integrate natural language processing (NLP) into effective workflows.

- R Markdown by Yihui Xie; J.j. Allaire; Garrett GrolemundISBN: 9780429782978Publication Date: 2018-07-27R Markdown: The Definitive Guide is the first official book authored by the core R Markdown developers that provides a comprehensive and accurate reference to the R Markdown ecosystem. With R Markdown, you can easily create reproducible data analysis reports, presentations, dashboards, interactive applications, books, dissertations, websites, and journal articles. (Note: this ebook is updated as functionality in R Markdown changes.)

One of the main benefits of R is the vast array of pre-existing packages (also called libraries) written by other R users and available for installation.

All official R packages are available through **CRAN** (Comprehensive R Archive Network). There are a *lot* of R packages available; this list of **recommended R packages** is a good starting point.

Here are some resources for popular data science-related R packages. Also be sure to check out **all of RStudio's cheatsheets**.

- dplyrdplyr is a package for data manipulation (data wrangling)
- tidyetidyr is a package for making data into "tidy" data
- lubridatelubridate is a package for working with date-time data
- data.tabledata.table uses an "enhanced version of data.frame" to speed up manipulations and calculations
- ggplot2ggplot2 is the most popular and widely used data visualization library for R
- ShinyShiny is a package for creating interactive graphics
- tidytexttidytext is a text mining package

- Last Updated: Jun 3, 2024 1:57 PM
- URL: https://shsulibraryguides.org/datascience
- Print Page

**Newton Gresham Library | (936) 294-1614 | (866) NGL-INFO | Ask a Question | Share a Suggestion**

Sam Houston State University | Huntsville, Texas 77341 | (936) 294-1111 | (866) BEARKAT

© Copyright Sam Houston State University | All rights reserved. | A Member of The Texas State University System