Data Science with R: Machine Learning

offered by NYC Data Science Academy

This 35-hour course introduces both the theoretical foundation of machine learning algorithms as well as their practical applications of machine learning techniques in R. It will introduce you to data mining, performance measures and dimension reduction, regression models, both linear and generalized, KNN and Naïve Bayes models, tree models, and SVMs as well as the Association Rule for analysis. After successfully completing of this course, you will be able to break down the mathematics behind major machine learning algorithms, explain the principles of machine learning algorithms, and implement these methods to solve real-world problems.


Unit 1: Foundations of Statistics and Simple Linear Regression
Understand your data
Statistical inference
Introduction to machine learning
Simple linear regression
Diagnostics and transformations
The coefficient of determination

Unit 2: Multiple Linear Regression and Generalized Linear Model
Multiple linear regression
Assumptions and diagnostics
Extending model flexibility
Generalized linear models
Logistic regression
Maximum likelihood estimation
Model interpretation
Assessing model fit

Unit 3: kNN and Naive Bayes, the Curse of Dimensionality
The K-Nearest Neighbors Algorithm
The choice of K and distance measure
Conditional probability: Bayes’ Theorem
The Naive Bayes’ Algorithm
The Laplace estimator
Dimension reduction
The PCA procedure
Ridge and Lasso regression

Unit 4: Tree Models and SVMs
Decision trees
Random forests
Variable Importance
Hyperplanes and maximal margin classifier
Sort margin and support vector classifier
Kernels and support vector machines

Unit 5: Cluster Analysis and Neural Networks
Cluster analysis
K-means clustering
Hierarchical clustering
Neural networks and perceptrons
Sigmoid neurons
Network topology and hidden features
Back propagation learning with gradient descent
Final Project

After 35 hours of structured lectures, students are encouraged to work on an exploratory data analysis project based on their own interests. A project presentation demo will be arranged afterwards.