Data Science Fundamentals LiveLessons teaches you the foundational concepts, theory, and techniques you need to know to become an effective data scientist. The videos present you with applied, example-driven lessons in Python and its associated ecosystem of libraries, where you get your hands dirty with real datasets and see real results.
Description
If nothing else, by the end of this video course you will have analyzed a number of datasets from the wild, built a handful of applications, and applied machine learning algorithms in meaningful ways to get real results. And along the way you learn the best practices and computational techniques used by a professional data scientist. More specifically, you learn how to acquire data that is openly accessible on the Internet by working with APIs. You learn how to parse XML and JSON data to load it into a relational database.
About the Instructor
Jonathan Dinu is an author, researcher, and most importantly, an educator. He is currently pursuing a Ph.D. in Computer Science at Carnegie Mellon’s Human Computer Interaction Institute (HCII), where he is working to democratize machine learning and artificial intelligence through interpretable and interactive algorithms. Previously, he founded Zipfian Academy (an immersive data science training program acquired by Galvanize), has taught classes at the University of San Francisco, and has built a Data Visualization MOOC with Udacity. In addition to his professional data science experience, he has run data science trainings for a Fortune 500 company and taught workshops at Strata, PyData, and DataWeek (among others). He first discovered his love of all things data while studying Computer Science and Physics at UC Berkeley, and in a former life he worked for Alpine Data Labs developing distributed machine learning algorithms for predictive analytics on Hadoop.
Jonathan has always had a passion for sharing the things he has learned in the most creative ways he can. When he is not working with students, you can find him blogging about data, visualization, and education at hopelessoptimism.com or rambling on Twitter @jonathandinu.
Skill Level
- Beginner
What You Will Learn
- How to get up and running with a Python data science environment
- The essentials of Python 3, including object-oriented programming
- The basics of the data science process and what each step entails
- How to build a simple (yet powerful) recommendation engine for Airbnb listings
- Where to find quality data sources and how to work with APIs programmatically
- Strategies for parsing JSON and XML into a structured form
- The basics of relational databases and how to use an ORM to interface with them in Python
- Best practices of data validation, including common data quality checks
Who Should Take This Course
- Aspiring data scientists looking to break into the field and learn the essentials necessary
- Journalists, consultants, analysts, or anyone else who works with data and looking to take a programmatic approach to exploring data and conducting analyses
- Quantitative researchers interested in applying theory to real projects and taking a computational approach to modeling.
- Software engineers interested in building intelligent applications driven by machine learning
- Practicing data scientists already familiar with another programming environment looking to learn how to do data science with Python
Course Requirements
- Basic understanding of programming
- Familiarity with Python and statistics are a plus
Lesson Descriptions
Lesson 1: Introduction to Data Science with Python
Lesson 1 begins with a working definition of data science (as we use it in the course), gives a brief history of the field, and provides motivating examples of data science products and applications. This lesson covers how to get set up with a data science programming environment locally, as well as gives you a crash course in the Python programming language if you are unfamiliar with it or are coming from another language such as R. Finally, it ends with an overview of the concepts and tools that the rest of the lessons cover to hopefully motivate you for and excite you about what’s to come!
Lesson 2: The Data Science Process—Building Your First Application
Lesson 2 introduces the data science process by walking through an end-to-end example of building your very first data science application, an AirBnB listing recommender.
You continue to learn how to work with and manipulate data in Python, without any external libraries yet, and leverage the power of the built-in Python standard library. The core application of this lesson covers the basics of building a recommendation engine and shows you how, with simple statistics and a little ingenuity, you can build a compelling recommender, given the right data. And finally, it ends with a formal treatment of the data science process and the individual steps it entails.
Lesson 3: Acquiring Data—Sources and Methods
Lesson 3 begins the treatment of each of the specific stages of the data science process, starting with the first: data acquisition. The lesson covers the basics of finding the appropriate data source for your problem and how to download the datasets you need once you have found them.
Starting with an overview of how the infrastructure behind the Internet works, you learn how to programmatically make HTTP requests in Python to access data through APIs, as well as the basics of two of the most common data formats: JSON and XML. The lesson ends by setting up the dataset we use for the rest of the course: Foursquare Venues.
Working with the Foursquare dataset, you learn how to interact with APIs and do some minor web scraping. You also learn how to find and acquire data from a variety of sources and keep track of its lineage all along the way. You learn to put yourself in the data science mindset and how to see the data (hidden in plain sight) that we interact with every day.
Lesson 4: Adding Structure—Data Parsing and Storage
Lesson 4 picks up with the second stage of what traditionally is referred to as an extract, transform, and load (ETL) pipeline, adding structure through the transformation of raw data.
You see how to work with a variety of data formats, including XML and JSON, by parsing the data we have acquired to eventually load it into an environment better-suited to exploration and analysis: a relational database. But before we load our data into a database, we take a short diversion to talk about how to conceptually model structure in data with code. You get a primer in object-oriented programming and learn how to leverage it to create abstractions and data models that define how you can interface with your data.
Lesson 5: Storing Data: Relational Databases (with SQLite)
Lesson 5 starts with an introduction to one of the most ubiquitous data technologies—the relational database. The lesson serves as an end cap to the ETL pipeline of the previous videos. You learn the ins and outs of the various strategies for storing data and see how to map the abstractions you created in Python to database tables through the use of an object-relational mapper (ORM). By being able to query and manipulate data with Python while persisting data in a database reliably, the interface ORMs provide gives you the best of both worlds.
Lesson 6: Data Validation and Exploration
Lesson 6 starts by showing you how to effectively query your data to understand what it contains, uncover any biases it might contain, and learn the best practices of dealing with missing values. After you have validated the quality of the data, you use descriptive statistics to learn how your data is distributed as well as learn the limits of point statistics (or rather single number estimates) and why it is often necessary to use visual techniques.
About LiveLessons Video Training
The LiveLessons Video Training series publishes hundreds of hands-on, expert-led video tutorials covering a wide selection of technology topics designed to teach you the skills you need to succeed. This professional and personal technology video series features world-leading author instructors published by your trusted technology brands: Addison-Wesley, Cisco Press, IBM Press, Pearson IT Certification, Prentice Hall, Sams, and Que. Topics include: IT Certification, Programming, Web Development, Mobile Development, Home and Office Technologies, Business and Management, and more. View all LiveLessons on InformIT at: http://www.informit.com/livelessons
About MIT horizon
MIT Horizon is an expansive content library built to help you explore emerging technologies. Through easy-to-understand lessons, you’ll be guided through the complexities of the latest technologies and simplified expert-level concepts. Designed for both technical and non-technical learners, you can examine bite-size content that can lead to maximum career outcomes.
For a limited time, gain access to the complete MIT Horizon library.
Register today for exclusive entry.
Program overview
Gain an interdisciplinary understanding of the essential fundamentals of analytics, including analysis methods, analytical tools, such as R, Python and SQL, and business applications.
Using common analytics software and tools, statistical and machine learning methods, and data-intensive computing and visualization techniques, learners will gain the experience necessary to integrate all of these parts for maximum impact.
Project experience is also included as part of the MicroMasters® program. Through these projects, learners will hone their skills with data collection, storage, analysis, and visualization tools, as well as gain instincts for how and when each tool should be used.
These projects provide hands-on experience with real-world business applications of analytics and a deeper understanding of how to apply analytics skills to make the biggest difference.
What you will learn
- Use essential analytics tools like R, Python, SQL, and more.
- Understand fundamental models and methods of analytics, and how and when to apply them.
- Learn to build a data analysis pipeline, from collection and storage through analysis and interactive visualization.
- Apply your new analytics skills in a business context to maximize your impact.
Program Class List
1Computing for Data Analysis
Course Details
2Data Analytics for Business
Course Details
3Introduction to Analytics Modeling
Course Details
Meet your instructors

Joel Sokol

Richard W. Vuduc

Sridhar Narasimhan

Charles Turnitsa
What you will learn
- The history of data science, tangible illustrations of how data science and analytics are used in decision making across multiple sectors today, and expert opinion on what the future might hold
- A practical understanding of the fundamental methods used by data scientists including; statistical thinking and conditional probability, machine learning and algorithms, and effective approaches for data visualization
- The major components of the Internet of Things (IoT) and the potential of IoT to totally transform the way in which we live and work in the not-to-distant future
- How data scientists are using natural language processing (NLP), audio and video processing to extract useful information from books, scientific articles, twitter feeds, voice recordings, YouTube videos and much more
Program Class List
1Statistical Thinking for Data Science and Analytics
Course Details
2Machine Learning for Data Science and Analytics
Course Details
3Enabling Technologies for Data Science and Analytics: The Internet of Things
Course Details
Meet your instructors

Tian Zheng
About Me

Kathy McKeown
About Me

Ansaf Salleb-Aouissi

Cliff Stein
About Me

David Blei
About Me

Itsik Peer
About Me

Mihalis Yannakakis
About Me

Peter Orbanz
About Me

Fred Jiang

Julia Hirschberg

Michael Collins

Shih-Fu Chang

Zoran Kostic
About Me

Andrew Gelman

David Madigan

Lauren Hannah

Eva Ascarza

James Curley
About Me
Create an end-to-end data analysis workflow in Python using the Jupyter Notebook and learn about the diverse and abundant tools available within the Project Jupyter ecosystem.
Overview
The Jupyter Notebook is a popular tool for learning and performing data science in Python (and other languages used in data science). This video tutorial will teach you about Project Jupyter and the Jupyter ecosystem and get you up and running in the Jupyter Notebook environment. Together, we’ll build a data product in Python, and you’ll learn how to share this analysis in multiple formats, including presentation slides, web documents, and hosted platforms (great for colleagues who do not have Jupyter installed on their machines). In addition to learning and doing Python in Jupyter, you will also learn how to install and use other programming languages, such as R and Julia, in your Jupyter Notebook analysis.
Learn How To
- Create a start-to-finish Jupyter Notebook workflow: from installing Jupyter to creating your data analysis and ultimately sharing your results
- Use additional tools within the Jupyter ecosystem that facilitate collaboration and sharing
- Incorporate other programming languages (such as R) in Jupyter Notebook analyses
Who Should Take This Course
- Users new to Jupyter Notebooks who want to use the full range of tools within the Jupyter ecosystem
- Data practitioners who want a repeatable process for conducting, sharing, and presenting data science projects
- Data practitioners who want to share data science analyses with friends and colleagues who do not use or do not have access to a Jupyter installation
Course Requirements
- Basic knowledge of Python.
- Download and install the Anaconda distribution of Python here. You can install either version 2.7 or 3.x, whichever you prefer.
- Create a GitHub account here (strongly recommended but not required).
- If you are unable to install software on your computer, you can access a hosted version via the Project Jupyter website (click on “try it in your browser”) or through Microsoft’s Azure Notebooks.
About Pearson Video Training
Pearson publishes expert-led video tutorials covering a wide selection of technology topics designed to teach you the skills you need to succeed. These professional and personal technology videos feature world-leading author instructors published by your trusted technology brands: Addison-Wesley, Cisco Press, Pearson IT Certification, Prentice Hall, Sams, and Que Topics include: IT Certification, Network Security, Cisco Technology, Programming, Web Development, Mobile Development, and more.
Meet your instructor

Jamie Whitacre
What you will learn
- Fundamental R programming skills
- Statistical concepts such as probability, inference, and modeling and how to apply them in practice
- Gain experience with the tidyverse, including data visualization with ggplot2 and data wrangling with dplyr
- Become familiar with essential tools for practicing data scientists such as Unix/Linux, git and GitHub, and RStudio
- Implement machine learning algorithms
- In-depth knowledge of fundamental data science concepts through motivating real-world case studies
Program Class List
1Data Science: R Basics
Course Details
2Data Science: Visualization
Course Details
3Data Science: Probability
Course Details
4Data Science: Inference and Modeling
Course Details
5Data Science: Productivity Tools
Course Details
6Data Science: Wrangling
Course Details
7Data Science: Linear Regression
Course Details
8Data Science: Machine Learning
Course Details
9Data Science: Capstone
Course Details
Meet your instructor

Rafael Irizarry
Data Science Fundamentals Part II teaches you the foundational concepts, theory, and techniques you need to know to become an effective data scientist. The videos present you with applied, example-driven lessons in Python and its associated ecosystem of libraries, where you get your hands dirty with real datasets and see real results.
Description
If nothing else, by the end of this video course you will have analyzed a number of datasets from the wild, built a handful of applications, and applied machine learning algorithms in meaningful ways to get real results. And all along the way you learn the best practices and computational techniques used by professional data scientists. You get hands-on experience with the PyData ecosystem by manipulating and modeling data. You explore and transform data with the pandas library, perform statistical analysis with SciPy and NumPy, build regression models with statsmodels, and train machine learning algorithms with scikit-learn. All throughout the course you learn to test your assumptions and models by engaging in rigorous validation. Finally, you learn how to share your results through effective data visualization.
Code: https://github.com/hopelessoptimism/data-science-fundamentals
Resources: http://hopelessoptimism.com/data-science-fundamentals
Forum:https://gitter.im/data-science-fundamentals
Data: http://insideairbnb.com/get-the-data.html
About the Instructor
Jonathan Dinu is an author, researcher, and most importantly educator. He is currently pursuing a Ph.D. in Computer Science at Carnegie Mellon’s Human Computer Interaction Institute (HCII) where he is working to democratize machine learning and artificial intelligence through interpretable and interactive algorithms. Previously, he founded Zipfian Academy (an immersive data science training program acquired by Galvanize), has taught classes at the University of San Francisco, and has built a Data Visualization MOOC with Udacity. In addition to his professional data science experience, he has run data science trainings for a Fortune 500 company and taught workshops at Strata, PyData, and DataWeek (among others). He first discovered his love of all things data while studying Computer Science and Physics at UC Berkeley, and in a former life he worked for Alpine Data Labs developing distributed machine learning algorithms for predictive analytics on Hadoop.
Jonathan has always had a passion for sharing the things he has learned in the most creative ways he can. When he is not working with students you can find him blogging about data, visualization, and education at hopelessoptimism.com or rambling on Twitter @jonathandinu.
Skill Level
- Beginner
What You Will Learn
- How to get up and running with a Python data science environment
- The basics of the data science process and what each step entails
- How (and why) to perform exploratory data analysis in Python with the pandas library
- The theory of statistical estimation to make inferences from your data and test hypotheses
- The fundamentals of probability and how to use scipy to work with distributions in Python
- How to build and evaluate machine learning models with scikit-learn
- The basics of data visualization and how to communicate your results effectively
- The importance of creating reproducible analyses and how to share them effectively
Who Should Take This Course
- Aspiring data scientists looking to break into the field and learn the essentials necessary.
- Journalists, consultants, analysts, or anyone else who works with data looking to take a programmatic approach to exploring data and conducting analyses.
- Quantitative researchers interested in applying theory to real projects and taking a computational approach to modeling.
- Software engineers interested in building intelligent applications driven by machine learning.
- Practicing data scientists already familiar with another programming environment looking to learn how to do data science with Python.
Course Requirements
- Basic understanding of programming.
- Familiarity with Python and statistics are a plus.
Lesson 7: Exploring Data—Analysis and Visualization
Lesson 7 starts with a short historical diversion on the process and evolution of exploratory data analysis, to help you understand the context behind it. John Tukey, the godfather of EDA, said in the Future of Data Analysis that “Far better an approximate answer to the right question, which is often vague, than an exact answer to the wrong question, which can always be made precise.”
Next you use matplotlib and seaborn, two Python visualization libraries, to learn how to visually explore a single dimension with histograms and boxplots. But a single dimension can only get us so far. By using scatterplots and other charts for higher dimensional visualization you see how to compare columns of our data to look for relationships between them.
The lesson finishes with a cautionary tale of when statistics lie by exploring the impact of mixed effects and Simpson’s paradox.
Lesson 8: Making Inferences—Statistical Estimation and Evaluation
In Lesson 8 we lay the groundwork for the methods and theory we need to make inferences from data, starting with an overview of the various approaches and techniques that are part of the rich history of statistical analysis.
Next you see how to leverage computational- and sampling-based approaches to make inferences from your data. After learning the basics of hypothesis testing, one of the most used techniques in the data scientist’s tool belt, you see how to use it to optimize a web application with A/B testing. All along the way you learn to appreciate the importance of uncertainty and see how to bound your reasoning with confidence intervals.
And finally, the lesson finishes by discussing the age-old question of correlation versus causation, why it matters, and how to account for it in your analyses.
Lesson 9: Statistical Modeling and Machine Learning
In Lesson 9 you learn how to leverage statistical models to build a powerful model to predict AirBnB listing prices and infer which listings are undervalued. It starts with a primer on probability and statistical distributions using SciPy and NumPy, including how to estimate parameters and fit distributions to data.
Next you learn about the theory of regression through a hands-on application with our AirBnB data and see how to model correlations in your data. By solving for the line of best fit and seeing how to understand its coefficients you can make inferences about your data.
But building a model is only one side of the coin, and if you cannot effectively evaluate how well it performs it might as well be useless. Next you learn how to evaluate a regression model, learn about what could go wrong when fitting a model, and learn to overcome these challenges.
The lesson finishes by talking about the differences between and nuances of statistics, modeling, and machine learning. I provide an overview of the various types of models and algorithms used for machine learning and introduce how to leverage scikit-learn—a robust machine learning library in Python—to make predictions.
About LiveLessons Video Training
The LiveLessons Video Training series publishes hundreds of hands-on, expert-led video tutorials covering a wide selection of technology topics designed to teach you the skills you need to succeed. This professional and personal technology video series features world-leading author instructors published by your trusted technology brands: Addison-Wesley, Cisco Press, IBM Press, Pearson IT Certification, Prentice Hall, Sams, and Que. Topics include: IT Certification, Programming, Web Development, Mobile Development, Home and Office Technologies, Business and Management, and more. View all LiveLessons on InformIT at: http://www.informit.com/livelessons
Data Science Fundamentals LiveLessons teaches you the foundational concepts, theory, and techniques you need to know to become an effective data scientist. The videos present you with applied, example-driven lessons in Python and its associated ecosystem of libraries, where you get your hands dirty with real datasets and see real results.
Description
If nothing else, by the end of this video course you will have analyzed a number of datasets from the wild, built a handful of applications, and applied machine learning algorithms in meaningful ways to get real results. And along the way you learn the best practices and computational techniques used by a professional data scientist. More specifically, you learn how to acquire data that is openly accessible on the Internet by working with APIs. You learn how to parse XML and JSON data to load it into a relational database.
About the Instructor
Jonathan Dinu is an author, researcher, and most importantly, an educator. He is currently pursuing a Ph.D. in Computer Science at Carnegie Mellon’s Human Computer Interaction Institute (HCII), where he is working to democratize machine learning and artificial intelligence through interpretable and interactive algorithms. Previously, he founded Zipfian Academy (an immersive data science training program acquired by Galvanize), has taught classes at the University of San Francisco, and has built a Data Visualization MOOC with Udacity. In addition to his professional data science experience, he has run data science trainings for a Fortune 500 company and taught workshops at Strata, PyData, and DataWeek (among others). He first discovered his love of all things data while studying Computer Science and Physics at UC Berkeley, and in a former life he worked for Alpine Data Labs developing distributed machine learning algorithms for predictive analytics on Hadoop.
Jonathan has always had a passion for sharing the things he has learned in the most creative ways he can. When he is not working with students, you can find him blogging about data, visualization, and education at hopelessoptimism.com or rambling on Twitter @jonathandinu.
Skill Level
- Beginner
What You Will Learn
- How to get up and running with a Python data science environment
- The essentials of Python 3, including object-oriented programming
- The basics of the data science process and what each step entails
- How to build a simple (yet powerful) recommendation engine for Airbnb listings
- Where to find quality data sources and how to work with APIs programmatically
- Strategies for parsing JSON and XML into a structured form
- The basics of relational databases and how to use an ORM to interface with them in Python
- Best practices of data validation, including common data quality checks
Who Should Take This Course
- Aspiring data scientists looking to break into the field and learn the essentials necessary
- Journalists, consultants, analysts, or anyone else who works with data and looking to take a programmatic approach to exploring data and conducting analyses
- Quantitative researchers interested in applying theory to real projects and taking a computational approach to modeling.
- Software engineers interested in building intelligent applications driven by machine learning
- Practicing data scientists already familiar with another programming environment looking to learn how to do data science with Python
Course Requirements
- Basic understanding of programming
- Familiarity with Python and statistics are a plus
Lesson Descriptions
Lesson 1: Introduction to Data Science with Python
Lesson 1 begins with a working definition of data science (as we use it in the course), gives a brief history of the field, and provides motivating examples of data science products and applications. This lesson covers how to get set up with a data science programming environment locally, as well as gives you a crash course in the Python programming language if you are unfamiliar with it or are coming from another language such as R. Finally, it ends with an overview of the concepts and tools that the rest of the lessons cover to hopefully motivate you for and excite you about what’s to come!
Lesson 2: The Data Science Process—Building Your First Application
Lesson 2 introduces the data science process by walking through an end-to-end example of building your very first data science application, an AirBnB listing recommender.
You continue to learn how to work with and manipulate data in Python, without any external libraries yet, and leverage the power of the built-in Python standard library. The core application of this lesson covers the basics of building a recommendation engine and shows you how, with simple statistics and a little ingenuity, you can build a compelling recommender, given the right data. And finally, it ends with a formal treatment of the data science process and the individual steps it entails.
Lesson 3: Acquiring Data—Sources and Methods
Lesson 3 begins the treatment of each of the specific stages of the data science process, starting with the first: data acquisition. The lesson covers the basics of finding the appropriate data source for your problem and how to download the datasets you need once you have found them.
Starting with an overview of how the infrastructure behind the Internet works, you learn how to programmatically make HTTP requests in Python to access data through APIs, as well as the basics of two of the most common data formats: JSON and XML. The lesson ends by setting up the dataset we use for the rest of the course: Foursquare Venues.
Working with the Foursquare dataset, you learn how to interact with APIs and do some minor web scraping. You also learn how to find and acquire data from a variety of sources and keep track of its lineage all along the way. You learn to put yourself in the data science mindset and how to see the data (hidden in plain sight) that we interact with every day.
Lesson 4: Adding Structure—Data Parsing and Storage
Lesson 4 picks up with the second stage of what traditionally is referred to as an extract, transform, and load (ETL) pipeline, adding structure through the transformation of raw data.
You see how to work with a variety of data formats, including XML and JSON, by parsing the data we have acquired to eventually load it into an environment better-suited to exploration and analysis: a relational database. But before we load our data into a database, we take a short diversion to talk about how to conceptually model structure in data with code. You get a primer in object-oriented programming and learn how to leverage it to create abstractions and data models that define how you can interface with your data.
Lesson 5: Storing Data: Relational Databases (with SQLite)
Lesson 5 starts with an introduction to one of the most ubiquitous data technologies—the relational database. The lesson serves as an end cap to the ETL pipeline of the previous videos. You learn the ins and outs of the various strategies for storing data and see how to map the abstractions you created in Python to database tables through the use of an object-relational mapper (ORM). By being able to query and manipulate data with Python while persisting data in a database reliably, the interface ORMs provide gives you the best of both worlds.
Lesson 6: Data Validation and Exploration
Lesson 6 starts by showing you how to effectively query your data to understand what it contains, uncover any biases it might contain, and learn the best practices of dealing with missing values. After you have validated the quality of the data, you use descriptive statistics to learn how your data is distributed as well as learn the limits of point statistics (or rather single number estimates) and why it is often necessary to use visual techniques.
About LiveLessons Video Training
The LiveLessons Video Training series publishes hundreds of hands-on, expert-led video tutorials covering a wide selection of technology topics designed to teach you the skills you need to succeed. This professional and personal technology video series features world-leading author instructors published by your trusted technology brands: Addison-Wesley, Cisco Press, IBM Press, Pearson IT Certification, Prentice Hall, Sams, and Que. Topics include: IT Certification, Programming, Web Development, Mobile Development, Home and Office Technologies, Business and Management, and more. View all LiveLessons on InformIT at: http://www.informit.com/livelessons
About this course
One of the principal responsibilities of a data scientist is to make reliable predictions based on data. When the amount of data available is enormous, it helps if some of the analysis can be automated. Machine learning is a way of identifying patterns in data and using them to automatically make predictions or decisions. In this data science course, you will learn basic concepts and elements of machine learning.
The two main methods of machine learning you will focus on are regression and classification. Regression is used when you seek to predict a numerical quantity. Classification is used when you try to predict a category (e.g., given information about a financial transaction, predict whether it is fraudulent or legitimate).
For regression, you will learn how to measure the correlation between two variables and compute a best-fit line for making predictions when the underlying relationship is linear. The course will also teach you how to quantify the uncertainty in your prediction using the bootstrap method. These techniques will be motivated by a wide range of examples.
For classification, you will learn the k-nearest neighbor classification algorithm, learn how to measure the effectiveness of your classifier, and apply it to real-world tasks including medical diagnoses and predicting genres of movies.
The course will highlight the assumptions underlying the techniques, and will provide ways to assess whether those assumptions are good. It will also point out pitfalls that lead to overly optimistic or inaccurate predictions.
What you’ll learn
- Fundamental concepts of machine learning
- Linear regression, correlation, and the phenomenon of regression to the mean
- Classification using the k-nearest neighbors algorithm
- How to compare and evaluate the accuracy of machine learning models
- Basic probability and Bayes’ theorem

Professional Certificate in Foundations of Data Science
A data science program for everyone.
Prerequisites
Foundations of Data Science: Computational Thinking with Python
Foundations of Data Science: Inferential Thinking by Resampling
Meet Your Instructors

Ani Adhikari

John DeNero

David Wagner
About this course
Analytical models are key to understanding data, generating predictions, and making business decisions. Without models it’s nearly impossible to gain insights from data. In modeling, it’s essential to understand how to choose the right data sets, algorithms, techniques and formats to solve a particular business problem.
In this course, part of the Analytics: Essential Tools and Methods MicroMasters® program, you’ll gain an intuitive understanding of fundamental models and methods of analytics and practice how to implement them using common industry tools like R.
You’ll learn about analytics modeling and how to choose the right approach from among the wide range of options in your toolbox.
You will learn how to use statistical models and machine learning as well as models for:
- classification;
- clustering;
- change detection;
- data smoothing;
- validation;
- prediction;
- optimization;
- experimentation;
- decision making.
What you’ll learn
- Fundamental analytics models and methods
- How to use analytics software, including R, to implement various types of models
- Understanding of when to apply specific analytics models

MicroMasters® Program in Analytics: Essential Tools and Methods
Learn in-demand skills to maximize your impact as an analyst.
Prerequisites
- Probability and statistics
- Basic programming proficiency
- Linear algebra
- Basic calculus
Who can take this course?
Unfortunately, learners from one or more of the following countries or regions will not be able to register for this course: Iran, Cuba and the Crimea region of Ukraine. While edX has sought licenses from the U.S. Office of Foreign Assets Control (OFAC) to offer our courses to learners in these countries and regions, the licenses we have received are not broad enough to allow us to offer this course in all locations. EdX truly regrets that U.S. sanctions prevent us from offering all of our courses to everyone, no matter where they live.
Meet Your Instructors

Joel Sokol
About this course
Today, businesses, consumers, and societies leave behind massive amounts of data as a by-product of their activities. Leading-edge companies in every industry are using analytics to replace intuition and guesswork in their decision-making. As a result, managers are collecting and analyzing enormous data sets to discover new patterns and insights and running controlled experiments to test hypotheses.
This course prepares students to understand business analytics and become leaders in these areas in business organizations. This course teaches the scientific process of transforming data into insights for making better business decisions. It covers the methodologies, issues, and challenges related to analyzing business data. It will illustrate the processes of analytics by allowing students to apply business analytics algorithms and methodologies to business problems. The use of examples places business analytics techniques in context and teaches students how to avoid the common pitfalls, emphasizing the importance of applying proper business analytics techniques.
What you’ll learn
After taking this course, students should be able to:
- approach business problems data-analytically. Students should be able to think carefully and systematically about whether and how data and business analytics can improve business performance.
- develop business analytics ideas, analyze data using business analytics software, and generate business insights.

MicroMasters® Program in Analytics: Essential Tools and Methods
Learn in-demand skills to maximize your impact as an analyst.
Prerequisites
Computing for Data Analysis, Introduction to Analytics Modeling, and each of their prerequisites
Who can take this course?
Unfortunately, learners from one or more of the following countries or regions will not be able to register for this course: Iran, Cuba and the Crimea region of Ukraine. While edX has sought licenses from the U.S. Office of Foreign Assets Control (OFAC) to offer our courses to learners in these countries and regions, the licenses we have received are not broad enough to allow us to offer this course in all locations. EdX truly regrets that U.S. sanctions prevent us from offering all of our courses to everyone, no matter where they live.
Meet Your Instructors
