Foundations of Data Science with Python#

by John M. Shea#

Cover image for the published book Foundations of Data Science with Python by John M. Shea

Available now! Here is a link to purchase from Amazon[1].


About the book#

This book is an introduction to the foundations of data science, including data visualization, statistics, probability, and dimensionality reduction. This book is targeted toward engineers and scientists, but it should be easily accessible to anyone who knows basic calculus and the basics of computer programming. By leveraging this background knowledge, this book fits a unique niche in the books on data science and statistics:

  • This book applies a modern, computational approach to work with data, and in particular, uses simulations (an approach called resampling) to answer statistical questions.

    • Many books on statistics (especially those for engineers) teach a theoretical approach to answering statistical questions that many learners find difficult to understand. Most learners can easily understand how resampling works in contrast to some arcane formula.


  • This text provides a basic, but rigorous, introduction to probability and its application to statistics.

    • Some of the other books that use the resampling approach to statistics omit the mathematical foundations because they are targeted toward a broader audience who may not have the rigorous mathematical background of engineers and scientists.


  • This book provides an introduction to some of the most important libraries in the Python data stack, including NumPy, SciPy, Matplotlib, and Pandas.


  • Real data sets are used wherever practical.

    • Many statistics books use contrived examples to make examples that are solvable using a calculator, but the majority of the data sets used in this book are analyzed using computer programs.


  • The data sets and the questions asked are chosen to appeal to a broad audience.

    • Although the approach taken and the material covered is targeted toward engineers and scientists, I try to investigate questions that will appeal to most readers, and especially those that may appeal to college students.


  • The book has a unique set of interactive materials, including interactive quizzes and animated flashcards.

    • These are available on this website. See the discussion and examples below.

About this website#

This website contains material that could not be included in the book itself, including:

  • Interactive tools to help students learn the material, including:

    • Interactive self-assessment quizzes via JupyterQuiz

    • Interactive flashcards to aid in learning terminology via JupyterCards \(\,\!\)

  • Animations and interactive visualizations

  • Problem sets for homework or additional practice (Coming soon!)

  • Errata for the book (When available)

  • A list of websites and books for those who want to continue their learning

Please feel free to open an issue on GitHub with suggested materials (especially problems for the interactive quizzes or homework/practice problem sets): jmshea/Foundations-of-Data-Science-with-Python

Examples of interactive materials#

Flashcards#

You can practice all of the flashcards from this book: Data Science Flashcards.