Discover the Best Resources to Learn Data Science

Image from Pixabay — no attribution required

You must have heard about data science a lot nowadays. And why not? It is one of the hottest jobs of the 21st century.

Do you want to be a data scientist? There are tons of courses available online that will help you to get started. But which one to choose? In this article we recommend some great courses and books that will help you become a data scientist. For our recommendation, we considered the cost of the course and the knowledge you get from the course, prioritizing those that can be taken for free.

What is Data Science?

Data Science is a multidisciplinary field that uses algorithms, statistics and scientific methods to extract knowledge and insights from structured and unstructured data. It takes concepts and theories from multiple fields such as computer science, mathematics, machine learning, statistics and information science to solve complex problems.

Data Scientist is a term used for data professionals who are skilled in data science, i.e., in organizing, analyzing and drawing inferences from large amounts of data. These professionals can identify problems and relevant questions, collect and organize data from various sources to create solutions and then communicate them effectively to organizations.

What are the skills required to become a data scientist?

  • Programming - Languages like R and Python
  • Statistics -Forecasting, Descriptive and Inferential Statistics
  • Machine Learning algorithms - i.e., Regression, Decision Trees, clustering
  • Data visualization - tools like Tableau, Excel and also Python and R
  • Database Handling- SQL, NoSQL, among others
  • Software Engineering
  • Data Wrangling — The process of cleaning, restructuring and enriching the raw data into a usable format

Acquiring these skills will certainly take you some time, but fortunately, there are many resources available online to get you started. How to select the right course? How and where to start? This is exactly what we will discuss in this article.

Image from Pixabay — no attribution required

We will cover the following courses and books in this blog, along with some additional resources.

Disclaimer: Opinions stated in the article are my own and I do not become financially compensated by any of the links in this blog. This blog does not contain affiliate links.

Content

Courses

  1. Coursera - Data Science Specialization by Johns Hopkins University
  2. Coursera - Data Science Professional Certificate - IBM
  3. Path Data Scientist - DataQuest
  4. edX - Professional Certificate in Data Science - Harvard University
  5. Code Academy - Become a Data Scientist
  6. Udemy - Complete Data Science Bootcamp

Books

  1. O’Reilly — Data Science from Scratch
  2. O’Reilly — Doing Data Science
  3. Roger D. Peng and Elizabeth MatsuiThe art of Data Science

Additional Resources

  1. · The Data Science Handbook
  2. · The Open Source Data Science Masters
  3. · Harvard UniversityOnline Data Science Courses

Courses

There are many online courses to help you get the right skills for data science. You can find these courses on E-learning websites like Coursera, edX and Udemy. One good thing about online courses is they can be highly interactive which could help you understand and remember concepts easily.

To be a successful data scientist there are some skills and knowledge you need to acquire. These are:

Image from the Author — All rights reserved

Now let's dive in and learn about some excellent online courses out there.

1. Data Science Specialization — Coursera (Johns Hopkins University)

Time Required: Approx. 8 Months, but you can do it quicker or slower if you don’t want a certificate.

Skills you will gain:

  • R programming
  • Use of Github
  • Data handling and Exploratory Data Analysis
  • Machine Learning
  • How to make statistical inference
  • Regression Analysis
  • How to build data products

Total Number of Courses in this Specialization: 10

  1. The Data Scientist ToolBox
  2. R Programming
  3. Getting and Cleaning Data
  4. Exploratory Data Analysis
  5. Reproducible Research
  6. Statistical Inference
  7. Regression Models
  8. Practical Machine Learning
  9. Developing Data Products
  10. Data Science Capstone

About:

This Specialization covers almost every aspect of the data science pipeline. This Specialization provides both a conceptual and a practical introduction to data science, focusing on tools and concepts you require to build a successful data science project, like R programming, gathering data from various sources like the web and databases, and how to present data.

The Specialization also teaches the use of tools like Github, which is a platform for discovering, sharing and building software. Data Scientists use Github for collaboration and to track changes in their work, they can even roll back changes if required. You also learn how to make inferences from the data, ask data-related questions and apply the skills you learn to solve a data-related problem.

To complete this Specialization you need to finish all the courses and finally submit a capstone project where you will apply the skills that you gained to a real-world problem (note that capstone projects are only available if you pay the subscription).

The Specialization uses R as the programming language. You will learn about R from scratch, both general programming principles and how to use R for data science and statistical computing. The courses will also teach you how to gather data from various sources and how to process data so that it can be used in machine learning projects.

Later courses in the Specialization teach exploratory data analysis, helping you summarize your data with visualization and exploratory techniques. They also cover the required tools and concepts required for making statistical inference from data, learning about regression models and using practical machine learning algorithms with R.

Finally you will learn about building data products that will help you tell the story derived from the data. After you learn these skills, you can submit a capstone project. These projects are based on real-world problems and are conducted together with industry, government and academic partners.

Pros:

  • Good course for beginners.
  • No prior programming experience required.
  • Less shy with mathematics than other courses.
  • It covers most skills required to tackle a data science project, helping you get started in the field of data science.
  • Capstone Projects allow you to apply your skills to a real-world problem, giving you an overview of the entire data science pipeline.

Cons:

  • Basic understanding of statistics and probability required.

2. Data Science Professional Certificate — Coursera (IBM)

Time Required: Approx. 3 Months

Skills you will gain:

  • Python programming
  • Data visualization
  • Machine Learning
  • Data analysis
  • Databases and SQL

Total Number of Courses in this Certificate: 9

  1. What is Data Science
  2. Data Science Open Source Tools
  3. Data Science Methodology
  4. Python for data science and AI
  5. Database and SQL for Data Science
  6. Data Analytics with Python
  7. Data Visualization with Python
  8. Machine Learning with Python
  9. Data Science Capstone

About:
This is a beginner level course on Data Science. This set of courses will give the basic skills that will help you get started with data science. It describes the open-source tools available and commonly used in data science like R Studio, Apache Zeppelin and Jupyter Notebook. The courses also teach Python programming, use of databases, and concepts around data analysis, data visualization and machine learning.

You will dive into Machine Learning algorithms, like Linear and Logistic Regression, K-Nearest Neighbors, Decision Trees, Support Vector Machines, Partitioned-based Clustering and Hierarchical Clustering. Finally, you will also get hands-on on the IBM cloud to build a Data Science capstone project.

Pros:

  • The content is well designed.
  • Good hands-on working on fundamental concepts.
  • No prior programming experience required.

Cons:

  • The course is just a beginner course, so a lot more depth is required in these topics.

3. Path Data Scientist — DATAQUEST

Time Required: Self-paced

Skills you will gain:

  • Python programming for Data Science
  • Data Visualization
  • Data cleaning and data analysis
  • Databases and SQL
  • Web Scraping
  • Concepts on Statistics, Probability and Linear Algebra
  • Machine Learning
  • Data Structures and Algorithms
  • Git and Version Control
  • Spark and Map Reduce

Total Number of Courses in this Track: 35

  1. Python for Data Science and Fundamentals
  2. Python for Data Science Intermediate
  3. Pandas and NumPy Fundamentals
  4. Exploratory Data Visualization
  5. Storytelling through Data Visualization
  6. Data Cleaning and Analysis
  7. Data Cleaning in Python — Advanced
  8. Data Cleaning Project Walk Through
  9. Elements of the command line
  10. Text Processing in the command line
  11. SQL fundamentals
  12. SQL Intermediate
  13. SQL Advanced
  14. APIs and Web Scraping
  15. Statistics and Fundamentals
  16. Statistics Intermediate
  17. Probability Fundamentals
  18. Conditional Probability
  19. Hypothesis testing and fundamentals
  20. Machine Learning Fundamentals
  21. Calculus for Machine Learning
  22. Linear Algebra for Machine Learning
  23. Machine Learning with Python — Intermediate
  24. Decision Trees: Python
  25. Deep Learning Fundamentals
  26. Machine Learning Project
  27. Kaggle Fundamentals
  28. Exploring topics in Data Science
  29. Natural Language Processing
  30. Functions: Advance
  31. Data Structures and Algorithms
  32. Python programming advance
  33. Command-line: Intermediate
  34. Git and Version Control
  35. Spark and Map Reduce

About:
This track is one of the best you can find online to learn about data science. The courses are incredibly detailed. These courses will teach you each and every aspect required to become a data scientist. The courses will teach you Python fundamentals and more advanced concepts, Data Analysis, Data Visualization and how to query Databases with SQL. They also dive into mathematics a lot more than other courses.

You will learn about statistics, linear algebra, calculus and probability. The later part of this track covers machine learning algorithms in detail, data structures and a bit on Natural Language Processing. These chapters will teach you to extract information about data and to apply machine learning algorithms effectively, based on the data type and organization's requirements.

These courses also cover Spark and Map-Reduce, which are technologies to analyse large datasets, and also how to use git and code version control to keep track of your projects.

Note, that you need a premium subscription to get access to all the courses of this data science track. On the plus side, with a premium membership you will get a monthly call by a mentor who will be your guide, review your resume and give you advice.

Pros:

  • Reasonable Price for the value
  • Highly Detailed
  • Covers almost every topic required to become a data scientist.
  • No prior programming for data science experience required.
  • Includes big data handling techniques such as Map-Reduce.
  • Mentorship (Premium)
  • All guided portfolio projects, i.e., instructions will help you in the project development (Premium)
  • Resume review (Premium)
  • Focus on the actual implementation of the math behind machine learning and not just importing a library.

Cons:

  • Chapters like Spark and Map Reduce may be difficult to grasp for beginners.
  • The certificate is not much worth to employers but will help you build a great portfolio.

4. Professional Certificate in Data Science — edX (Harvard University)

Time Required: Approx. 1 year and 5 months

Skills you will gain:

  • R programming
  • Data Visualization
  • Mathematics
  • Productivity Tools
  • Wrangling
  • Machine Learning

Total Number of Courses in this Certification: 9

  1. R Basics
  2. Data Visualization
  3. Probability
  4. Inference and Modelling
  5. Productivity Tools
  6. Wrangling
  7. Linear Regression
  8. Machine Learning
  9. Capstone

About:
This course covers the basic skills required to be a successful data scientist such as R programming, statistical concepts (probability and inference), data visualization, data wrangling and helps you to get familiar with the tools required for practicing data science such as Unix/Linux, R studio, git and Github.

The material of the course is enough to prepare you with the required knowledge base in data science to tackle real-world problems. This course includes real-world case studies to help you understand the data science application a little more. Case studies in this certification are Trends in World Health and Economics, US Crime Rates, The Financial Crisis of 2007–2008, Election Forecasting, Building a Baseball Team (inspired by Moneyball), and Movie Recommendation Systems.

Pros:

  • Equips you will all the basic skills.
  • Includes real-world case studies.
  • No prior programming for data science experience required.
  • Good Hands-on experience

Cons:

  • A very short course in terms of topic covered.
  • Pricey

5. Become a Data Scientist — Code Academy

Time Required: Approx. 35 weeks

Skills you will gain:

  • SQL
  • Python Programming
  • Data Analysis
  • Data Visualization
  • Statistics
  • Web Scraping
  • Machine Learning

Total Number of Courses in this Track: 26

  1. The importance of Data and SQL Basics
  2. SQL: Basics
  3. SQL: Intermediate
  4. Go Off-Platform with SQL
  5. Analyze real data with SQL
  6. Python functions and logic
  7. Python Lists and Loops
  8. Advanced Python
  9. Python Cumulative Project
  10. Data Analysis with Python
  11. Data Visualization
  12. Visualization Cumulative Projects
  13. Data Visualization Capstone Project
  14. Learn Statistics with Python
  15. Introduction to Statistics with NumPy
  16. Hypothesis Testing with SciPy
  17. Practical Data Cleaning
  18. Data Analysis Capstone Projects
  19. Learn Web Scraping with Beautiful Soup
  20. Machine Learning: Supervised Learning
  21. Supervised Machine Learning Cumulative Project
  22. Machine Learning: Unsupervised Learning
  23. Unsupervised Machine Learning Cumulative Project
  24. Perceptrons and Neural Nets
  25. Machine Learning Capstone Project
  26. Natural Language Processing

About:
This track is good to get you started with the skills required for data science. The course contains well-explained content with code-along, guided projects and quizzes. It starts with basic SQL (basic syntax, SQL tables, SQL queries) and Python (functions, loops and data structures). Then moves towards more advanced concepts such as using SQL for analyzing real organization data and using Python for statistics and data cleaning, which will help you tackle real-world data science projects.

The course also covers data analysis, data visualization, statistics and machine learning. The good thing about the certification is that it teaches you all the topics with guided projects and with the Python packages that are important to get hands-on in data science.

This track also teaches machine learning to a very good extent covering supervised and unsupervised learning, neural networks and Natural Language Processing. You will also get hands-on with SciPy, a Python package for hypothesis testing, and Web Scraping with beautiful soup, which is another Python package for parsing HTML and XML documents. It is used to extract data from websites.

Pros:

  • Well-detailed course.
  • Good value for money.
  • Easy to use and navigate
  • No prior programming for data science experience required.
  • Good Hands-on experience with well-guided projects

Cons:

  • Free courses in this track are a bit too general.
  • Some topics may seem a bit confusing, for example that around hypothesis testing with SciPy.

6. Complete Data Science Bootcamp — Udemy

Time Required: Self-paced (29 Hours of video lectures, consider adding extra time to do the exercises)

Skills you will gain:

  • Python programming
  • Mathematics (Statistics and Probability)
  • Statistical Analysis
  • Deep Learning Frameworks
  • Machine Learning

Total Number of Chapters in this Certification: 8

  1. Introduction to Data Science
  2. Probability
  3. Statistics
  4. Introduction to Python
  5. Advance statistical methods with Python
  6. Mathematics
  7. Deep Learning
  8. Case Studies

About:
This course is one of the best0selling courses on Udemy with a 4.5-star average user rating and more than 201,049 students enrolled. It starts with the basics (introduction to probability and statistics) and assumes no programming experience. You will learn Python programming and how Python is used in data science.

This course also covers a great deal of mathematics which is important for a data scientist, such as the mathematics behind machine learning algorithms, probability distributions, descriptive statistics, inferential statistics, and more. You will also learn how to apply statistical methods with Python on your data. Finally, you will use the acquired skills to solve real-world case studies.

Pros:

  • No programming experience required
  • Active Q&A support
  • Access to future updates
  • Real-world case studies

Cons:

  • Does not cover topics like linear algebra, calculus, data wrangling, and the use of git and Github, which are valuable in data science.

7. Data Science A to Z– Udemy

Time Required: Self-paced (21 hours of video lectures, consider adding extra time to do the exercises)

Skills you will gain:

  • Tableau
  • SSIS (a component of the Microsoft SQL Server database software that can be used to perform a broad number of data migration tasks)
  • Gretl (Open Source statistical package)
  • SQL for data science
  • Data Wrangling
  • Data Visualization
  • Communication

Total Number of Chapters in this Course: 4

  1. Data Visualization
  2. Modelling
  3. Data Preparation
  4. Communication

About:
This course is a beginner course for data science enthusiasts. You will learn about data mining with Tableau (a data visualization and analysis tool) and also how to apply machine learning algorithms such as linear regression, logistic regression and various evaluation metrics used to measure the performance of machine learning algorithms and statistical models.

You will learn how to handle data with SQL and how to do basic visualization with Tableau. Along with this you will also learn about data wrangling and manipulation. The added advantage of this course is that it also teaches you how to effectively communicate your project to business people which is one of the important skills required as a data scientist.

Pros:

  • Gives you good foundational knowledge about the topics covered.
  • Teaches you about various frameworks from data science toolbox such as Tableau, SSIS and Gretl.
  • Teaches about communications which are an essential skill for a data scientist.

Cons:

  • Does not cover the majority of topics required for a data scientist such as basic programming with R and Python, probability, statistics and many machine learning algorithms.
  • Not a course for beginners who do not have experience with basic programming, statistics and probability.
  • Not much coding languages covered such as R or Python rather focus was more on data science frameworks.
  • No real-world projects or case studies.

Books

There are tons of books out there for data science. Some are tool-specific like using R or Python for data science, and some others cover more about the basic skills required for data science skills. These books should be able to get you started with data science projects, and these are the ones we recommend:

1. Data Science from Scratch — O’Reilly

Content:

  1. Introduction to data science
  2. Crash Course in Python
  3. Visualizing Data
  4. Linear Algebra
  5. Statistics
  6. Probability
  7. Hypothesis and Inference
  8. Gradient Descent
  9. Getting Data
  10. Working with data
  11. Machine Learning
  12. K-nearest neighbors
  13. Naive Bayes
  14. Simple Linear Regression
  15. Multiple Regression
  16. Logistic Regression
  17. Decision Trees
  18. Neural Networks
  19. Deep Learning
  20. Clustering
  21. Natural Language Processing
  22. Network Analysis
  23. Recommender Systems
  24. Databases and SQL
  25. Map Reduce
  26. Data ethics
  27. Go forth and do data science

About:
This book is for people with some knowledge of programming (in any language), but Python is not a prerequisite as it starts with a crash course in Python. Most of the book is focused on Machine Learning algorithms, providing a good understanding of these algorithms along with their implementations. This book teaches the concepts, that is, the various machine learning algorithms from scratch, without using fancy Python libraries like Scikit-learn but rather implementing the algorithm manually. Therefore, it is a good book to learn about the actual working of various machine learning algorithms. The book also covers many of the basic topics for data science such as Python programming and data cleaning and visualization, and you will be able to get your hands dirty with some data science projects.

2. Doing Data Science — O’Reilly

Content:

  1. Introduction to Data Science
  2. Statistical Inference, Exploratory Data Analysis, and the Data Science Process
  3. Algorithms
  4. Spam Filters, Naive Bayes, and Wrangling
  5. Logistic Regression
  6. Time Stamps and Financial Modelling
  7. Extracting Meaning from Data
  8. Recommendation Engines: Building a User-Facing Data Product at Scale
  9. Data Visualization and Fraud Detection
  10. Social Networks and Data Journalism
  11. Causality
  12. Epidemiology
  13. Lessons Learned from Data Competitions: Data Leakage and Model Evaluation
  14. Data Engineering: MapReduce, Pregel, and Hadoop
  15. The Students Speak
  16. Next-Generation Data Scientists, Hubris, and Ethics

About:
Doing Data Science from O’Reilly is another great book for data science beginners. It contains a good balance between statistics and machine learning. In this book, you will find various data science case studies from data scientists from Google, Microsoft and Ebay, as well as, sample code and exercises to help you learn the concepts. It also covers topics like Map-Reduce, Hadoop, Pregel (a system for large scale graph processing) which are not covered in most other books or online courses.

The initial chapters will help you gain the skills required for data exploration and making inferences from data. Middle chapters are more focused on machine learning, recommendation systems, data visualization and model evaluation techniques. Later chapters teach about Data Engineering, an aspect of data science that focuses on the practical application of data collection and analysis, which is a very important topic in data science, especially when you are handling enormous amounts of data.

This book is written from a statistical perspective and is enough to get you started with data science.

3. The Art of Data ScienceRoger D. Peng and Elizabeth Matsui

Content:

  1. Data Analysis as Art
  2. Epicycles of Analysis
  3. Stating and Refining the Question
  4. Exploratory Data Analysis
  5. Using Models to Explore Your Data
  6. Inference: A Primer
  7. Formal Modeling
  8. Inference vs. Prediction: Implications for Modeling Strategy
  9. Interpreting Your Results
  10. Communication

About:
This book was written as a side tool for the Coursera John Hopkins specialization. It is a high-level overview of the data science workflow. It is a good book for students who have no practical experience in data science. The book contains R programming language code snippets to help you along with the chapters.

This book does a great job in explaining data analysis and how to interpret results. Though it lacks some of the important topics of data science, it is useful for beginners to learn about data science and the common pipeline used when working on a data science project.

3. Additional Resources

Here are some additional resources that can be helpful for you in your data science journey.

The Data Science Handbook: A comprehensive overview of data science covering data analytics, programming and business skills necessary to master the discipline.

The Open Source Data Science Masters: It is an open-source collection of books, courses and articles to learn data science, with content around both the theory and technologies of data science.

Harvard Data Science Courses: Collection of data science courses created by Harvard University.

Lead Data Scientist, author of “Python Feature Engineering Cookbook”, instructor of online courses on machine learning and developer of open-source Python code.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store