Skip to Main Content

Data Science

This guide provides resources related to data science.

Getting started with Python

If you're a student in Data Science, you'll be learning Python through your coursework. The resources here are meant to supplement that learning, as well as provide avenues for you to pursue your more specific interests (e.g., machine learning, web scraping, etc.).

If you are not a Data Science student, these resources are still useful! Learning a programming language can help automate your research, whether you're working in biology, physics, social science, or some other domain. For those new to programming in general, the "Introductory Python tutorials" section is the place to start.

Download Python

First things first, you'll need to download Python, which is free. You can download Python by itself from the Python Software Foundation here. Alternatively, you can install the Anaconda distribution of Python. Anaconda includes Python, Jupyter notebooks capability, many pre-installed Python packages, and more, making it easy to get started quickly.

Introductory Python tutorials

Advanced Python tutorials

We have quite a few advanced Python books available through the library. Some of these are only accessible via a physical book copy, but many are available as e-books.


In the meantime, these books may be useful.

Python Libraries

One of the main benefits of Python is the vast array of pre-existing packages (also called libraries), written by other Python users and available for installation. You can find Python packages on PyPI, the Python Package Index. 

This overview of popular Python libraries provides a starting point for finding applicable libraries. For more advanced users, this comprehensive list of packages by topic includes links to further resources. 

If using the Anaconda distribution of Python, many libraries come pre-installed. This tutorial covers the steps needed to install additional packages.

Here are some resources for popular data science Python libraries: 

Read here for an overview of some of the other data science-related packages, including wget (downloading files from browsers), FlashText (cleaning text for natural language processing), PyFlux (time series data), and more.


Newton Gresham Library | (936) 294-1614 | (866) NGL-INFO | Ask a Question | Share a Suggestion

Sam Houston State University | Huntsville, Texas 77341 | (936) 294-1111 | (866) BEARKAT
© Copyright Sam Houston State University | All rights reserved. | A Member of The Texas State University System