Data Science: Machine Learning and Predictions

Course 3 of 3: Professional Certificate® in the Foundations of Data Science 6 Weeks 4–6 hours per week

Please select the start dates for your courses below.

Scheduled Start:

About this course

One of the principal responsibilities of a data scientist is to make reliable predictions based on data. When the amount of data available is enormous, it helps if some of the analysis can be automated. Machine learning is a way of identifying patterns in data and using them to automatically make predictions or decisions. In this data science course, you will learn basic concepts and elements of machine learning.

The two main methods of machine learning you will focus on are regression and classification. Regression is used when you seek to predict a numerical quantity. Classification is used when you try to predict a category (e.g., given information about a financial transaction, predict whether it is fraudulent or legitimate).

For regression, you will learn how to measure the correlation between two variables and compute a best-fit line for making predictions when the underlying relationship is linear. The course will also teach you how to quantify the uncertainty in your prediction using the bootstrap method. These techniques will be motivated by a wide range of examples.

For classification, you will learn the k-nearest neighbor classification algorithm, learn how to measure the effectiveness of your classifier, and apply it to real-world tasks including medical diagnoses and predicting genres of movies.

The course will highlight the assumptions underlying the techniques, and will provide ways to assess whether those assumptions are good. It will also point out pitfalls that lead to overly optimistic or inaccurate predictions.

What you’ll learn

  • Fundamental concepts of machine learning
  • Linear regression, correlation, and the phenomenon of regression to the mean
  • Classification using the k-nearest neighbors algorithm
  • How to compare and evaluate the accuracy of machine learning models
  • Basic probability and Bayes’ theorem


Foundations of Data Science: Computational Thinking with Python

Foundations of Data Science: Inferential Thinking by Resampling

Meet Your Instructors

Ani Adhikari

Teaching Professor of Statistics at UC Berkeley Ani Adhikari, Senior Lecturer in Statistics at UC Berkeley, has received the Distinguished Teaching Award at Berkeley and the Dean's Award for Distinguished Teaching at Stanford University. While her research interests are centered on applications of statistics in the natural sciences, her primary focus has always been on teaching and mentoring students. She teaches courses at all levels and has a particular affinity for teaching statistics to students who have little mathematical preparation. She received her undergraduate degree from the Indian Statistical Institute, and her Ph.D. in Statistics from Berkeley.

John DeNero

Giancarlo Teaching Fellow in the EECS Department at UC Berkeley John DeNero is the Giancarlo Teaching Fellow in the UC Berkeley EECS Department. He joined the Cal faculty in 2014 to focus on undergraduate education in computer science and data science. He teaches and co-develops two of the largest courses on campus: introductory computer science for majors (3000 students per year) and introductory data science (1500 students per year).

David Wagner

Professor of Computer Science at UC Berkeley David Wagner is Professor of Computer Science at the University of California at Berkeley. He has published over 100 peer-reviewed papers in the scientific literature and has co-authored two books on encryption and computer security. His research has analyzed and contributed to the security of cellular networks, 802.11 wireless networks, electronic voting systems, and other widely deployed systems.

Experience Level


Learning Partner

University of California, Berkeley

Program Type

Professional Certificate


Data Analysis & Statistics Data Science IT
Advanced Business Harvard X-Series