What you will learn

  • Fundamental R programming skills
  • Statistical concepts such as probability, inference, and modeling and how to apply them in practice
  • Gain experience with the tidyverse, including data visualization with ggplot2 and data wrangling with dplyr
  • Become familiar with essential tools for practicing data scientists such as Unix/Linux, git and GitHub, and RStudio
  • Implement machine learning algorithms
  • In-depth knowledge of fundamental data science concepts through motivating real-world case studies

Program Class List

1
Data Science: R Basics

Course Details
Build a foundation in R and learn how to wrangle, analyze, and visualize data.

2
Data Science: Visualization

Course Details
Learn basic data visualization principles and how to apply them using ggplot2.

3
Data Science: Probability

Course Details
Learn probability theory -- essential for a data scientist -- using a case study on the financial crisis of 2007-2008.

4
Data Science: Inference and Modeling

Course Details
Learn inference and modeling, two of the most widely used statistical tools in data analysis.

5
Data Science: Productivity Tools

Course Details
Keep your projects organized and produce reproducible reports using GitHub, git, Unix/Linux, and RStudio.

6
Data Science: Wrangling

Course Details
Learn to process and convert raw data into formats needed for analysis.

7
Data Science: Linear Regression

Course Details
Learn how to use R to implement linear regression, one of the most common statistical modeling approaches in data science.

8
Data Science: Machine Learning

Course Details
Build a movie recommendation system and learn the science behind one of the most popular and successful data science techniques.

9
Data Science: Capstone

Course Details
Show what you've learned from the Professional Certificate Program in Data Science.

Meet your instructor

Rafael Irizarry

Professor of Biostatistics at Harvard University
Rafael Irizarry is a Professor of Biostatistics at the Harvard T.H. Chan School of Public Health and a Professor of Biostatistics and Computational Biology at the Dana Farber Cancer Institute. For the past 15 years, Dr. Irizarry’s research has focused on the analysis of genomics data. During this time, he has also has taught several classes, all related to applied statistics. Dr. Irizarry is one of the founders of the Bioconductor Project, an open source and open development software project for the analysis of genomic data. His publications related to these topics have been highly cited and his software implementations widely downloaded.