You are thinking about becoming a data scientist. You have looked around and soon enough realized that R is one of the languages you need to learn. But what is R?
R is a free software environment designed for statistical computing and data analysis, widely used among statisticians, data miners and data scientists, both in industry and academia. In fact, R is one of the most widely used languages for data science, along with Python.
Why is R good for Data Science?
R was designed for statistical analysis and therefore the repertoire of statistical tests available in R is unbeatable. In addition, there is a growing number of libraries to help accomplish various data science daily tasks including i) implementation of machine learning algorithms, ii) data gathering (through SQL queries or from the web, among others), iii) data visualization and iv) results presentation.
R is used as the data science and machine learning language of choice in many companies, including big names. And perhaps more exciting, R is extensively used in academia, for statistical analysis in the context of genome analysis, drug testing, RNA sequencing and more.
Should I learn R?
Whether you should learn Python or R for data science, is your choice. In fact, R is better at certain tasks like statistical analysis, whereas Python is better in other areas, like software development. However, whatever you decide, if you are stepping into Data Science, you will make your life easier if you stick to one language and learn it well.
One good reason to start by learning R for Data Science is that the best online courses to learn Data Science use R. Therefore, if you choose to learn R, these courses will allow you to master 2 skills at the same time. Is that not amazing? I will cover these resources in detail later on in this post.
Another good reason to start by learning R is that there are plenty of resources available online to learn R for Data Science, some of them are short and free courses, which will give you a sense of whether this is something you enjoy. Learning to use a programming language requires a big time commitment and a lot of practice, so it certainly has to be something you like doing to ensure success.
What exactly do I need to learn?
It is essential that you know the R fundamentals, which include how to install R, how to execute R code, and how to perform operations with the basic data structures, which are vectors, lists, matrices, factors and dataframes. Next, you need to become familiar with the installation and use of R libraries.
Libraries are special R packages that allow you to carry on certain tasks, like data manipulation and cleaning, data visualization, and the implementation of machine learning algorithms. Among these libraries you will find caret for machine learning, dplyr, tidyr, stringr and lubridate for data cleaning and manipulation, ggplot2 for data visualisation and shiny and R Markdown for results presentation, among many others.
Once you have learnt the R fundamentals and the use of the main data science and data manipulation libraries, you are ready to begin to tackle your own data science projects. Not so bad, right?
Where can I learn R then?
In this post, I describe the most widely used resources to learn R for Data Science. Some of them will give you a flavor of R, and others will allow you to make a deeper dive into programming and Data Science. There are courses to fit all time commitments and budgets. I will also make a few recommendations of the resources that I think work best and are a must have for any self-taught data scientist. Let’s get started…
Disclaimer: Opinions stated in the article are my own and I do not become financially compensated by any of the links in this blog. This blog does not contain affiliate links.
Short and Free Online Courses
Comprehensive Online Courses
- The Analytics Edge, MIT (edX)
- Data Science Specialization, John Hopkins University (Coursera)
- R Programming A-Z™: R for Data Science with Real Exercises! (Udemy)
Books for R programming for Data Science
- Hands-On Programming with R, Garrett Grolemund
- R for Data Science, Hadley Wickham and Garrett Grolemund
- R programming for Data Science, Roger Peng
- Exploratory Data Analysis, Roger Peng
Other Resources to Learn R
- What are some good resources for learning R?
- Learning Path on R — Step by Step Guide to Learn Data Science on R
Short and Free Online Courses
There are several short and free R programming courses on the web that will teach you the very basics of R programming. In these courses you will learn how to install R and R Studio, how to perform basic R operations and how to work with vectors, matrices, dataframes, factors and lists. These are normally referred to as R fundamentals.
These short courses take 2–4 hrs to complete, plus some additional time to do the exercises. On the upside, they will give you a quick sense of whether this is something you like. However, the applications to data science will be minimal, and you will certainly need a follow up course to master the language. Below I highlight the most recommended short and free R courses available online.
1. Introduction to R (Datacamp)
Introduction to R on DataCamp is an introductory course to R, ideal for complete beginners, with no programming experience. It teaches the very basics of R, including factors, lists and data frames, therefore taking you through the very first steps of programming in R.
Introduction to R takes 4 hrs to complete and comes along with 62 practice exercises to help you learn the language. This basic course is for free, however more advanced courses in Datacamp require payment.
2. R Basics — R programming Language Introduction (Udemy)
R Basics is an introductory course for the complete beginner available in Udemy. R Basics teaches the first steps in R coding and the use of data structures, and also provides an introduction to some of the libraries for data visualisation, like lattice, as well as, text analytics and machine learning. These lectures are very introductory though, so keep your expectation to the right level. R Basics is right for you if you want to get a sense of what R and Data Science are. R Basics has received amazing reviews by almost a thousand students, so it is certainly an option to check out.
Comprehensive Online Courses
In addition to the short introductory courses on R programming, there are very good comprehensive online courses to learn R programming and Data Science. These courses run across several weeks, or months, depending on how much time you can put into doing the exercises.
These comprehensive online courses will take you through the many aspects of Data Science, including data gathering, data cleaning, data analysis and visualization and building of machine learning models, all of this, using R as the programming language. Some courses will provide you with a good intuition about these concepts, others will make a deeper dive into the content.
Best Choice for complete beginners
1. The Analytics Edge, MIT (edX)
The Analytics Edge on edX, designed and taught by instructors from the MIT is a top choice as a starting point to learn R for Data Science, particularly, if you have little or no experience with programming languages. The course will teach you how to use R for data analysis and machine learning, as it explores a variety of data from real business scenarios. You will learn to use R and to tackle data science problems all in one go.
The course is mostly orientated to machine learning and statistics, teaching you how to perform basic statistical calculations, as well as data visualizations and how to build machine learning models, all of this with R. It will give you a solid ground to start tackling your own data science projects. However, the Analytics Edge falls a bit short on programming principles and best practices. Don’t worry though, you can pick this up later.
The Analytics Edge was archived last time I checked, which means that it won’t run any more on a yearly basis, but the material should still be available.
Top recommendation ✔️
2. Data Science Specialisation, John Hopkins University (Coursera)
The Data Science Specialisation on Coursera is a comprehensive specialization designed and taught by professors from the John Hopkins University. The specialization is composed of 9 courses that cover R programming, data acquisition and cleaning, data exploration, the scientific approach to data analysis and project design, statistics, machine learning and data visualization. It is very exhaustive in its content and covers every essential aspect of Data Science, all of them carried out using R.
The Data Science Specialization was designed to give you an end-to-end view of the development of a data science project, from inception and planning, to execution, to result dissemination. It also gives you an overview of the tools required for data science, which are R programming of course, but also the use of Git for code version control, SQL and web scraping among others.
The first course of the Specialization, the Data Scientist Toolbox, is designed to introduce you into the tools, software and knowledge required for Data Science. In the second course, R Programming, you will learn the fundamentals of R programming, including flow control with for, if and while loops, functions and how to make operations using data structures like lists, vectors, dataframes and matrices among others.
In the third course, Getting and Cleaning Data, you will learn how to collect data from various sources, like databases or the web. You will also learn how to slice, clean and manipulate these data with the most widely used R libraries for data manipulation.
The next courses will introduce you to the scientific way of presenting and developing a data science project. You will learn how to analyze and present data in the course Exploratory Data Analysis and how to document a scientific project in the course Reproducible Research, as well as many aspects of statistics, regression and some practical implementations of machine learning in the courses Statistical Inference, Regression Models and Practical Machine Learning. Along these courses, you will make a dive into data plotting using various R libraries for visualization, and next you will learn various techniques for data representation, statistical modelling and machine learning, including dimensionality reduction.
The Data Science Specialization is more science and research oriented, including several examples from scientific topics on which the professors work on regularly, like air pollution and RNA analysis. You will get a good sense of how scientists set out to answer specific questions, build hypotheses and present their data. The Data Science Specialization is less shy with mathematical and statistical concepts than any of the other courses available online. It includes formulas and mathematical explanations of the machine learning models as well as probability distribution theory. This deeper dive will help you, however, become more resourceful when building and optimizing your machine learning algorithms. This is why, this specialization is the one I recommend the most to learn R for data science.
This deeper dive will help you, however, become more resourceful when building and optimizing your machine learning algorithms. This is why, this specialization is the one I recommend the most to learn R for data science.
The professors of the data science Specialization have also created accompanying books, which you can get in addition, or as an alternative, if you prefer reading over watching videos. I cover these books later on in the post.
You can audit all the courses of the Data Science Specialization for free, or you can pay a fee if you wish to get a certification.
3. R Programming A-Z™: R for Data Science with Real Exercises! (Udemy)
R Programming A-Z™: R for Data Science with Real Exercises! is an extensive course tailored to complete beginners, with minimal or no knowledge of R and statistics. The course focuses on teaching R by doing and exercising, it grows step by step, and includes some real-world examples. With R Programming A-Z™ you will learn the R fundamentals like the common operations with vectors, lists, matrices and dataframes, and the course also includes a section on data visualization.
R Programming A-Z™ is a more comprehensive choice to get you started with R compared to the shorter courses mentioned previously. However it is not as thorough and extensive as the Data Science Specialization from Coursera in terms of data science topics and it does not cover machine learning either. The course makes a good option for a quick start with R programming if that is what you are after. And then you can learn the machine learning topics from another resource.
Courses on Udemy are not free, however Udemy and the instructors release discounted price coupons regularly.
1. R Tutorial on Cyclismo
The R Tutorial on Cyclismo is very basic online tutorial that covers the basics of R programming, including operations with factors, lists and dataframes, and matrices, and it will also cover data manipulation, statistics and machine learning to a certain extent. This tutorial is perhaps good for a first dive into R programming, to get familiar with R syntax and programming style. R Tutorial on Cyclismo was developed by students who were new to R, but had some knowledge of computing.
2. An Introduction to R by Venebles and Smith
An Introduction to R by Venebles and Smith is a free online book recommended by Quick-R, a highly popular website for R users. The book content is exhaustive. It covers R basics like factors, lists and matrices and also probability distributions, data manipulation, flow controls (loops and conditions) and how to use and make R packages. This book is well suited for complete beginners as well as those with some knowledge of R, as the content is quite exhaustive. An Introduction to R makes the best alternative to online courses and it is also open source.
Books for R programming for Data Science
There are a number of books from which you can learn R programming for data science, which have been written by top data scientists, scientific researchers and R package developers themselves. And in addition, most of them are available for free, so if you want to learn R for data science you are definitely in a great space.There are very good reviews of these books already available in the web, so here I will describe a few of the books and then point you in the right direction to get more information if needed. Let’s dive in…
1. Hands-On Programming with R, Garrett Grolemund
Hands-On Programming with R will teach you how to program in R with, as its name indicates, using hands-on examples. It is excellent for non-programmers as the book was designed to provide a friendly introduction to the R language. The book covers the R fundamentals, including working with R data structures and objects, and how to control flow.
2. R for Data Science, Hadley Wickham and Garrett Grolemund
R for Data Science is the follow up book to Hands-On Programming with R. It teaches how to analyse and manipulate data with the R packages created by the author Hadley Wickham.
3. R programming for Data Science, Roger Peng
R programming for Data Science is the book that accompanies the R Programming course of the Data Science Specialisation in Coursera. The course covers the basics of R programming and some additional topics such as regular expressions and debugging, also useful for data science projects.
4. Exploratory Data Analysis, Roger Peng
Exploratory Data Analysis is another book that accompanies the Data Science Specialisation on Coursera, covering data analysis and manipulation and visualisation of high dimensional data.