COMP/MATH 350: Applied Machine Learning, Fall 2018

This syllabus is subject to change based on specific class needs, especially the schedule. Significant deviations will be discussed in class.

Logistics

Content

Description

An introduction to the hot topics of machine learning, data science and data mining. The course aims to supply students with a useful toolbox of machine learning techniques that can be applied to real-life data. Techniques may include logistic and linear regression, SVMs, decision trees, neural networks, and clustering. The focus will be on developing important skills in preparing data and selecting and evaluating models, though we will delve into the mathematical intuition behind each model.

Topics

Possible topics include:

Learning Objectives

Sources

The required course textbook is:

Kuhn, Max and Johnson, Kjell. Applied Predictive Modeling. Springer. 2013. ISBN-13: 978-1-4614-6848-6.

I also recommend, but do not require:

Guido, Sarah and Muller, Andreas. Introduction to Machine Learning with Python. O’Reilly. 2016. ISBN-13: 978-1449369415.

Goodfellow, Ian and Bengio, Yoshua and Courville, Aaron. Deep Learning. MIT Press. 2016. ISBN: 9780262035613.

Policies

Assessment

Assignments

The course workload is as follows:

Category Number of Assignments Final Grade Weight
Homework 5–7 50%
Midterm 1 20%
Final 1 20%
Participation - 10%

Most (probably all) homework assignments will involve programming. Each exam focuses primarily, but not necessarily exclusively, on material covered since the previous exam. In other words, the final exam may include one or two questions from first-half material.

Your participation grade is based on a variety of activities. During class I will often make sure of the Socrative app, so you’ll need to install this on your phones. Participating in Socrative questions and with in-class group activities is required for a decent participation grade; a full grade also includes asking questions either in class or in office hours.

Grading

Your final grade is based on a weighted average of particular assignment categories, with weights shown above. You can estimate your current grade based on your scores and these weights. You may always visit the instructor outside of class to discuss your current standing.

This courses uses a standard grading scale. Assignments and final grades will not be curved except in rare cases when its deemed necessary by the instructor. Percentage grades translate to letter grades as follows:

Score Grade
94–100 A
90–93 A-
88–89 B+
82–87 B
80–81 B-
78–79 C+
72–77 C
70–71 C-
68–69 D+
62–67 D
60–61 D-
0–59 F

You are always welcome to challenge a grade that you feel is unfair or calculated incorrectly. Mistakes made in your favor will never be corrected to lower your grade. Mistakes made not in your favor will be corrected. Basically, after the initial grading your score can only go up as the result of a challenge.

Workload

The weekly workload for this course will vary by student and over the semester, but on average should be about 12 hours per week. The follow table provides a rough estimate of the distribution of this time over different course components for a 16 week semester.

Category Total Time Time/Week (Hours)
Lectures 55 3.5
Homework 72 4.5
Exam Study 27 1.5
Reading+Unstructured Study   2.5
    12

Schedule

The following tentative calendar should give you a feel for how work is distributed throughout the semester. Assignments and events are listed in the week they are due or when they occur. This calendar is subject to change based on the circumstances of the course.

Date Topic Assignment
Wed 08/22 (Week 1) Intro, What is Machine Learning (pdf)  
Fri 08/24 ML Principles; Python Do a Python tutorial, e.g. this one
Mon 08/27 (Week 2) Essential Python Libraries HWK 1 out
Tue 08/28 Python Visualization Basics Read APM 1-2
Wed 08/29 Classification and Regression Case Studies Read IMLP 1
Fri 08/31 Model Complexity and Sources of Error Read APM 4.1-4.2
Mon 09/03 (Week 3) Model Tuning Read APM 4.3-4.4
Tue 09/04 Comparing Models Read 5.1-5.2
Wed 09/05 Linear and Ridge Regression HWK 1 due (Solutions), HWK 2 out, Read APM 6.1-6.2
Fri 09/07 Understanding Regularization Read APM 6.4
Mon 09/10 (Week 4) Linear Models for (Binary) Classification Read APM 12.1-12.2
Tue 09/11 Multiclass Classification Read 12.5
Wed 09/12 Computational Considerations; SGD  
Fri 09/14 Scaling Data Read APM Ch 3
Mon 09/17 (Week 5) Review and HWK 2 Questions HWK 3 out
Tue 09/18 Processing Pipelines and Feature Engineering  
Wed 09/19 Feature Engineering HWK 2 due (Solutions),
Fri 09/21 Imputation Read APM 3.4
Mon 09/24 (Week 6) Feature Selection Read APM 19
Tue 09/25 Support Vector Machines (no slides) Read APM 7.3
Wed 09/26 SVM Kernels Read APM 13.4
Fri 09/28 Trees and Forests Read APM 8-1-8.2, 14.1
Mon 10/01 (Week 7) Homework 2 Review and Random Forests Read 8.4-8.5, 14.3-14.4
Tue 10/02 Ensembles and Gradient Boosting Read 8.6, 14.5
Wed 10/03 Stacking  
Fri 10/05 Calibration HWK 3 checkpoint due, Practice Midterm out
Mon 10/08 (Week 8) Review Practice Midterm Solutions
Tue 10/09 Review  
Wed 10/10 Midterm (Solutions)  
(10/12–10/15) (Fall Break)  
Tue 10/16 (Week 9) Midterm Review HWK 3 due (Solutions)
Wed 10/17 Mean, Median, Mode  
Fri 10/19 Model Evaluation Metrics Read APM 11 & 5
Mon 10/22 (Week 10) Imbalanced Data Read APM 16
Tue 10/23 Synthetic Data Generation  
(Wed 10/24) (Mentoring Day – No class)  
Fri 10/26 Dimensionality Reduction  
Mon 10/29 (Week 11) (Class Cancelled)  
Tue 10/30 Dimensionality Reduction (continued) Read IMLP 142-157, 165-170, HWK 4 out
Wed 10/31 (Class Cancelled)  
Fri 11/02 Clustering Read IMLP 170-193
Mon 11/05 (Week 12) (Class Cancelled)  
Tue 11/06 Mixture Models and Clustering Evaluation Read IMLP 193-211
Wed 11/07 Unsupervised Clustering Evaluation  
Fri 11/09 More Clustering Evaluation  
Mon 11/12 (Week 13) NMF and Outlier Detection Read IMLP 158-170
Tue 11/13 Working with text data Read IMLP 325-336
Wed 11/14   Read IMLP 337-349
Fri 11/16 LSA HWK 4 due, Read IMLP 349-358
Mon 11/19 (Week 14) Topic Models HWK 5 out
Tue 11/20 Word and document embeddings  
(11/21–11/25) (Thanksgiving Break)  
Mon 11/26 (Week 15) (Class Cancelled)  
Tue 11/27 Neural networks Skim DL Ch. 6
Wed 11/28   Read APM 7.1 and 13.2
Fri 11/30 NN in Practice HWK 6 out
Mon 12/03 (Week 16) Advanced NN Watch these, Practice Final
Tue 12/04    
Wed 12/05 Review Practice Final Solutions
Tue 12/11 3:00 PM Final Exam  

Monmouth College Services