# Discover the Best Resources to Learn Data Science

You must have heard about data science a lot nowadays. And why not? It is one of the hottest jobs of the 21st century.

Do you want to be a data scientist? There are tons of courses available online that will help you to get started. But which one to choose? In this article we recommend some great courses and books that will help you become a data scientist. For our recommendation, we considered the cost of the course and the knowledge you get from the course, prioritizing those that can be taken for free.

# What is Data Science?

Data Science is a multidisciplinary field that uses algorithms, statistics and scientific methods to extract knowledge and insights from structured and unstructured data. It takes concepts and theories from multiple fields such as computer science, mathematics, machine learning, statistics and information science to solve complex problems.

*Data Scientist* is a term used for data professionals who are skilled in data science, i.e., in organizing, analyzing and drawing inferences from large amounts of data. These professionals can identify problems and relevant questions, collect and organize data from various sources to create solutions and then communicate them effectively to organizations.

# What are the skills required to become a data scientist?

**Programming**- Languages like R and Python**Statistics**-Forecasting, Descriptive and Inferential Statistics**Machine Learning algorithms -**i.e., Regression, Decision Trees, clustering**Data visualization -**tools like Tableau, Excel and also Python and R**Database Handling-**SQL, NoSQL, among others**Software Engineering****Data Wrangling —**The process of cleaning, restructuring and enriching the raw data into a usable format

Acquiring these skills will certainly take you some time, but fortunately, there are many resources available online to get you started. How to select the right course? How and where to start? This is exactly what we will discuss in this article.

We will cover the following courses and books in this blog, along with some additional resources.

Disclaimer: Opinions stated in the article are my own and I do not become financially compensated by any of the links in this blog. This blog does not contain affiliate links.

# Content

## Courses

- Coursera -
**Data Science Specialization**by Johns Hopkins University - Coursera -
**Data Science Professional Certificate**- IBM **Path Data Scientist**- DataQuest- edX -
**Professional Certificate in Data Science**- Harvard University - Code Academy -
**Become a Data Scientist** - Udemy -
**Complete Data Science Bootcamp**

## Books

- O’Reilly —
**Data Science from Scratch** - O’Reilly —
**Doing Data Science** - Roger D. Peng and Elizabeth Matsui —
**The art of Data Science**

## Additional Resources

- ·
**The Data Science Handbook** - ·
**The Open Source Data Science Masters** - · Harvard University
**—****Online Data Science Courses**

# Courses

There are many online courses to help you get the right skills for data science. You can find these courses on E-learning websites like Coursera, edX and Udemy. One good thing about online courses is they can be highly interactive which could help you understand and remember concepts easily.

To be a successful data scientist there are some skills and knowledge you need to acquire. These are:

Now let's dive in and learn about some excellent online courses out there.

## 1. Data Science Specialization — Coursera (Johns Hopkins University)

**Time Required:** Approx. 8 Months, but you can do it quicker or slower if you don’t want a certificate.

**Skills you will gain:**

- R programming
- Use of Github
- Data handling and Exploratory Data Analysis
- Machine Learning
- How to make statistical inference
- Regression Analysis
- How to build data products

**Total Number of Courses in this Specialization:** 10

- The Data Scientist ToolBox
- R Programming
- Getting and Cleaning Data
- Exploratory Data Analysis
- Reproducible Research
- Statistical Inference
- Regression Models
- Practical Machine Learning
- Developing Data Products
- Data Science Capstone

**About:**

This Specialization covers almost every aspect of the data science pipeline. This Specialization provides both a conceptual and a practical introduction to data science, focusing on tools and concepts you require to build a successful data science project, like R programming, gathering data from various sources like the web and databases, and how to present data.

The Specialization also teaches the use of tools like Github, which is a platform for discovering, sharing and building software. Data Scientists use Github for collaboration and to track changes in their work, they can even roll back changes if required. You also learn how to make inferences from the data, ask data-related questions and apply the skills you learn to solve a data-related problem.

To complete this Specialization you need to finish all the courses and finally submit a capstone project where you will apply the skills that you gained to a real-world problem (note that capstone projects are only available if you pay the subscription).

The Specialization uses R as the programming language. You will learn about R from scratch, both general programming principles and how to use R for data science and statistical computing. The courses will also teach you how to gather data from various sources and how to process data so that it can be used in machine learning projects.

Later courses in the Specialization teach exploratory data analysis, helping you summarize your data with visualization and exploratory techniques. They also cover the required tools and concepts required for making statistical inference from data, learning about regression models and using practical machine learning algorithms with R.

Finally you will learn about building data products that will help you tell the story derived from the data. After you learn these skills, you can submit a capstone project. These projects are based on real-world problems and are conducted together with industry, government and academic partners.

**Pros:**

- Good course for beginners.
- No prior programming experience required.
- Less shy with mathematics than other courses.
- It covers most skills required to tackle a data science project, helping you get started in the field of data science.
- Capstone Projects allow you to apply your skills to a real-world problem, giving you an overview of the entire data science pipeline.

**Cons:**

- Basic understanding of statistics and probability required.

## 2. Data Science Professional Certificate — Coursera (IBM)

**Time Required:** Approx. 3 Months

**Skills you will gain:**

- Python programming
- Data visualization
- Machine Learning
- Data analysis
- Databases and SQL

**Total Number of Courses in this Certificate:** 9

- What is Data Science
- Data Science Open Source Tools
- Data Science Methodology
- Python for data science and AI
- Database and SQL for Data Science
- Data Analytics with Python
- Data Visualization with Python
- Machine Learning with Python
- Data Science Capstone

**About:**

This is a beginner level course on Data Science. This set of courses will give the basic skills that will help you get started with data science. It describes the open-source tools available and commonly used in data science like R Studio, Apache Zeppelin and Jupyter Notebook. The courses also teach Python programming, use of databases, and concepts around data analysis, data visualization and machine learning.

You will dive into Machine Learning algorithms, like Linear and Logistic Regression, K-Nearest Neighbors, Decision Trees, Support Vector Machines, Partitioned-based Clustering and Hierarchical Clustering. Finally, you will also get hands-on on the IBM cloud to build a Data Science capstone project.

**Pros:**

- The content is well designed.
- Good hands-on working on fundamental concepts.
- No prior programming experience required.

**Cons:**

- The course is just a beginner course, so a lot more depth is required in these topics.

## 3. Path Data Scientist — DATAQUEST

**Time Required:** Self-paced

**Skills you will gain:**

- Python programming for Data Science
- Data Visualization
- Data cleaning and data analysis
- Databases and SQL
- Web Scraping
- Concepts on Statistics, Probability and Linear Algebra
- Machine Learning
- Data Structures and Algorithms
- Git and Version Control
- Spark and Map Reduce

**Total Number of Courses in this Track:** 35

- Python for Data Science and Fundamentals
- Python for Data Science Intermediate
- Pandas and NumPy Fundamentals
- Exploratory Data Visualization
- Storytelling through Data Visualization
- Data Cleaning and Analysis
- Data Cleaning in Python — Advanced
- Data Cleaning Project Walk Through
- Elements of the command line
- Text Processing in the command line
- SQL fundamentals
- SQL Intermediate
- SQL Advanced
- APIs and Web Scraping
- Statistics and Fundamentals
- Statistics Intermediate
- Probability Fundamentals
- Conditional Probability
- Hypothesis testing and fundamentals
- Machine Learning Fundamentals
- Calculus for Machine Learning
- Linear Algebra for Machine Learning
- Machine Learning with Python — Intermediate
- Decision Trees: Python
- Deep Learning Fundamentals
- Machine Learning Project
- Kaggle Fundamentals
- Exploring topics in Data Science
- Natural Language Processing
- Functions: Advance
- Data Structures and Algorithms
- Python programming advance
- Command-line: Intermediate
- Git and Version Control
- Spark and Map Reduce

**About:**

This track is one of the best you can find online to learn about data science. The courses are incredibly detailed. These courses will teach you each and every aspect required to become a data scientist. The courses will teach you Python fundamentals and more advanced concepts, Data Analysis, Data Visualization and how to query Databases with SQL. They also dive into mathematics a lot more than other courses.

You will learn about statistics, linear algebra, calculus and probability. The later part of this track covers machine learning algorithms in detail, data structures and a bit on Natural Language Processing. These chapters will teach you to extract information about data and to apply machine learning algorithms effectively, based on the data type and organization's requirements.

These courses also cover Spark and Map-Reduce, which are technologies to analyse large datasets, and also how to use git and code version control to keep track of your projects.

Note, that you need a premium subscription to get access to all the courses of this data science track. On the plus side, with a premium membership you will get a monthly call by a mentor who will be your guide, review your resume and give you advice.

**Pros:**

- Reasonable Price for the value
- Highly Detailed
- Covers almost every topic required to become a data scientist.
- No prior programming for data science experience required.
- Includes big data handling techniques such as Map-Reduce.
- Mentorship (Premium)
- All guided portfolio projects, i.e., instructions will help you in the project development (Premium)
- Resume review (Premium)
- Focus on the actual implementation of the math behind machine learning and not just importing a library.

**Cons:**

- Chapters like Spark and Map Reduce may be difficult to grasp for beginners.
- The certificate is not much worth to employers but will help you build a great portfolio.

## 4. Professional Certificate in Data Science — edX (Harvard University)

**Time Required:** Approx. 1 year and 5 months

**Skills you will gain:**

- R programming
- Data Visualization
- Mathematics
- Productivity Tools
- Wrangling
- Machine Learning

**Total Number of Courses in this Certification:** 9

- R Basics
- Data Visualization
- Probability
- Inference and Modelling
- Productivity Tools
- Wrangling
- Linear Regression
- Machine Learning
- Capstone

**About:**

This course covers the basic skills required to be a successful data scientist such as R programming, statistical concepts (probability and inference), data visualization, data wrangling and helps you to get familiar with the tools required for practicing data science such as Unix/Linux, R studio, git and Github.

The material of the course is enough to prepare you with the required knowledge base in data science to tackle real-world problems. This course includes real-world case studies to help you understand the data science application a little more. Case studies in this certification are Trends in World Health and Economics, US Crime Rates, The Financial Crisis of 2007–2008, Election Forecasting, Building a Baseball Team (inspired by Moneyball), and Movie Recommendation Systems.

**Pros:**

- Equips you will all the basic skills.
- Includes real-world case studies.
- No prior programming for data science experience required.
- Good Hands-on experience

**Cons:**

- A very short course in terms of topic covered.
- Pricey

## 5. Become a Data Scientist — Code Academy

**Time Required:** Approx. 35 weeks

**Skills you will gain:**

- SQL
- Python Programming
- Data Analysis
- Data Visualization
- Statistics
- Web Scraping
- Machine Learning

**Total Number of Courses in this Track: **26

- The importance of Data and SQL Basics
- SQL: Basics
- SQL: Intermediate
- Go Off-Platform with SQL
- Analyze real data with SQL
- Python functions and logic
- Python Lists and Loops
- Advanced Python
- Python Cumulative Project
- Data Analysis with Python
- Data Visualization
- Visualization Cumulative Projects
- Data Visualization Capstone Project
- Learn Statistics with Python
- Introduction to Statistics with NumPy
- Hypothesis Testing with SciPy
- Practical Data Cleaning
- Data Analysis Capstone Projects
- Learn Web Scraping with Beautiful Soup
- Machine Learning: Supervised Learning
- Supervised Machine Learning Cumulative Project
- Machine Learning: Unsupervised Learning
- Unsupervised Machine Learning Cumulative Project
- Perceptrons and Neural Nets
- Machine Learning Capstone Project
- Natural Language Processing

**About:**

This track is good to get you started with the skills required for data science. The course contains well-explained content with code-along, guided projects and quizzes. It starts with basic SQL (basic syntax, SQL tables, SQL queries) and Python (functions, loops and data structures). Then moves towards more advanced concepts such as using SQL for analyzing real organization data and using Python for statistics and data cleaning, which will help you tackle real-world data science projects.

The course also covers data analysis, data visualization, statistics and machine learning. The good thing about the certification is that it teaches you all the topics with guided projects and with the Python packages that are important to get hands-on in data science.

This track also teaches machine learning to a very good extent covering supervised and unsupervised learning, neural networks and Natural Language Processing. You will also get hands-on with SciPy, a Python package for hypothesis testing, and Web Scraping with beautiful soup, which is another Python package for parsing HTML and XML documents. It is used to extract data from websites.

**Pros:**

- Well-detailed course.
- Good value for money.
- Easy to use and navigate
- No prior programming for data science experience required.
- Good Hands-on experience with well-guided projects

**Cons:**

- Free courses in this track are a bit too general.
- Some topics may seem a bit confusing, for example that around hypothesis testing with SciPy.

## 6. Complete Data Science Bootcamp — Udemy

**Time Required: **Self-paced (29 Hours of video lectures, consider adding extra time to do the exercises)

**Skills you will gain:**

- Python programming
- Mathematics (Statistics and Probability)
- Statistical Analysis
- Deep Learning Frameworks
- Machine Learning

**Total Number of Chapters in this Certification:** 8

- Introduction to Data Science
- Probability
- Statistics
- Introduction to Python
- Advance statistical methods with Python
- Mathematics
- Deep Learning
- Case Studies

**About:**

This course is one of the best0selling courses on Udemy with a 4.5-star average user rating and more than 201,049 students enrolled. It starts with the basics (introduction to probability and statistics) and assumes no programming experience. You will learn Python programming and how Python is used in data science.

This course also covers a great deal of mathematics which is important for a data scientist, such as the mathematics behind machine learning algorithms, probability distributions, descriptive statistics, inferential statistics, and more. You will also learn how to apply statistical methods with Python on your data. Finally, you will use the acquired skills to solve real-world case studies.

**Pros:**

- No programming experience required
- Active Q&A support
- Access to future updates
- Real-world case studies

**Cons:**

- Does not cover topics like linear algebra, calculus, data wrangling, and the use of git and Github, which are valuable in data science.

## 7. Data Science A to Z– Udemy

**Time Required: **Self-paced (21 hours of video lectures, consider adding extra time to do the exercises)

**Skills you will gain:**

- Tableau
- SSIS (a component of the Microsoft SQL Server database software that can be used to perform a broad number of data migration tasks)
- Gretl (Open Source statistical package)
- SQL for data science
- Data Wrangling
- Data Visualization
- Communication

**Total Number of Chapters in this Course:** 4

- Data Visualization
- Modelling
- Data Preparation
- Communication

**About:**

This course is a beginner course for data science enthusiasts. You will learn about data mining with Tableau (a data visualization and analysis tool) and also how to apply machine learning algorithms such as linear regression, logistic regression and various evaluation metrics used to measure the performance of machine learning algorithms and statistical models.

You will learn how to handle data with SQL and how to do basic visualization with Tableau. Along with this you will also learn about data wrangling and manipulation. The added advantage of this course is that it also teaches you how to effectively communicate your project to business people which is one of the important skills required as a data scientist.

**Pros:**

- Gives you good foundational knowledge about the topics covered.
- Teaches you about various frameworks from data science toolbox such as Tableau, SSIS and Gretl.
- Teaches about communications which are an essential skill for a data scientist.

**Cons:**

- Does not cover the majority of topics required for a data scientist such as basic programming with R and Python, probability, statistics and many machine learning algorithms.
- Not a course for beginners who do not have experience with basic programming, statistics and probability.
- Not much coding languages covered such as R or Python rather focus was more on data science frameworks.
- No real-world projects or case studies.

# Books

There are tons of books out there for data science. Some are tool-specific like using R or Python for data science, and some others cover more about the basic skills required for data science skills. These books should be able to get you started with data science projects, and these are the ones we recommend:

## 1. Data Science from Scratch — O’Reilly

**Content:**

- Introduction to data science
- Crash Course in Python
- Visualizing Data
- Linear Algebra
- Statistics
- Probability
- Hypothesis and Inference
- Gradient Descent
- Getting Data
- Working with data
- Machine Learning
- K-nearest neighbors
- Naive Bayes
- Simple Linear Regression
- Multiple Regression
- Logistic Regression
- Decision Trees
- Neural Networks
- Deep Learning
- Clustering
- Natural Language Processing
- Network Analysis
- Recommender Systems
- Databases and SQL
- Map Reduce
- Data ethics
- Go forth and do data science

**About:**

This book is for people with some knowledge of programming (in any language), but Python is not a prerequisite as it starts with a crash course in Python. Most of the book is focused on Machine Learning algorithms, providing a good understanding of these algorithms along with their implementations. This book teaches the concepts, that is, the various machine learning algorithms from scratch, without using fancy Python libraries like Scikit-learn but rather implementing the algorithm manually. Therefore, it is a good book to learn about the actual working of various machine learning algorithms. The book also covers many of the basic topics for data science such as Python programming and data cleaning and visualization, and you will be able to get your hands dirty with some data science projects.

## 2. Doing Data Science — O’Reilly

**Content:**

- Introduction to Data Science
- Statistical Inference, Exploratory Data Analysis, and the Data Science Process
- Algorithms
- Spam Filters, Naive Bayes, and Wrangling
- Logistic Regression
- Time Stamps and Financial Modelling
- Extracting Meaning from Data
- Recommendation Engines: Building a User-Facing Data Product at Scale
- Data Visualization and Fraud Detection
- Social Networks and Data Journalism
- Causality
- Epidemiology
- Lessons Learned from Data Competitions: Data Leakage and Model Evaluation
- Data Engineering: MapReduce, Pregel, and Hadoop
- The Students Speak
- Next-Generation Data Scientists, Hubris, and Ethics

**About:***Doing Data Science* from O’Reilly is another great book for data science beginners. It contains a good balance between statistics and machine learning. In this book, you will find various data science case studies from data scientists from Google, Microsoft and Ebay, as well as, sample code and exercises to help you learn the concepts. It also covers topics like Map-Reduce, Hadoop, Pregel (a system for large scale graph processing) which are not covered in most other books or online courses.

The initial chapters will help you gain the skills required for data exploration and making inferences from data. Middle chapters are more focused on machine learning, recommendation systems, data visualization and model evaluation techniques. Later chapters teach about Data Engineering, an aspect of data science that focuses on the practical application of data collection and analysis, which is a very important topic in data science, especially when you are handling enormous amounts of data.

This book is written from a statistical perspective and is enough to get you started with data science.

## 3. The Art of Data Science — Roger D. Peng and Elizabeth Matsui

**Content:**

- Data Analysis as Art
- Epicycles of Analysis
- Stating and Refining the Question
- Exploratory Data Analysis
- Using Models to Explore Your Data
- Inference: A Primer
- Formal Modeling
- Inference vs. Prediction: Implications for Modeling Strategy
- Interpreting Your Results
- Communication

**About:**

This book was written as a side tool for the Coursera John Hopkins specialization. It is a high-level overview of the data science workflow. It is a good book for students who have no practical experience in data science. The book contains R programming language code snippets to help you along with the chapters.

This book does a great job in explaining data analysis and how to interpret results. Though it lacks some of the important topics of data science, it is useful for beginners to learn about data science and the common pipeline used when working on a data science project.

# 3. Additional Resources

Here are some additional resources that can be helpful for you in your data science journey.

**The Data Science Handbook****: **A comprehensive overview of data science covering data analytics, programming and business skills necessary to master the discipline.

**The Open Source Data Science Masters****: **It is an open-source collection of books, courses and articles to learn data science, with content around both the theory and technologies of data science.

**Harvard Data Science Courses****:** Collection of data science courses created by Harvard University.