COMP 347: Applied Machine Learning, Fall 2022

This syllabus is subject to change based on specific class needs, especially the schedule. Significant deviations will be discussed in class. Individual exceptions to the policies and schedule are granted only in cases of true emergency. Please make arrangements with me if an emergency arises.

Logistics

Content

Description

An introduction to machine learning with topics in data science and data mining. The course aims to supply students with a useful toolbox of machine learning techniques that can be applied to real-life data. Techniques may include logistic and linear regression, SVMs, decision trees, neural networks, and clustering. The focus will be on developing important skills in preparing data and selecting and evaluating models, though we will delve into the mathematical intuition behind each model.

Topics

Possible topics include:

Learning Objectives

Sources

The required course textbook is:

Géron, Aurélien. Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems (2nd edition). O’Reilly. 2019. ISBN-13: 978-1492032649

I also recommend, but do not require:

Guido, Sarah and Muller, Andreas. Introduction to Machine Learning with Python. O’Reilly. 2016. ISBN-13: 978-1449369415.

We might also use some material from:

Deisenroth, Marc Peter and Faisal, A. Aldo and Ong, Cheng Soon. Mathematics for Machine Learning. Cambridge University Press. 2020. ISBN-13: 978-1108455145. Available at https://mml-book.github.io/.

Programming Environment

We’ll be using Python 3 along with several of the standard data science libraries available, including matplotlib, pandas, numpy, and scikit-learn, among others. The easiest way to get set up is to install the Anaconda Python distribution, which includes everything you need. It also includes the Spyder IDE for development, although you are free to develop in any text editor you like. You may also use the department server if you’d like; just send me an email.

Policies

Assessment

Assignments

The course workload is as follows:

Category Number of Assignments Final Grade Weight
Homework 5–7 50%
Midterm 1 20%
Final 1 20%
Participation - 10%

Most (probably all) homework assignments will involve programming. Each exam focuses primarily, but not necessarily exclusively, on material covered since the previous exam. In other words, the final exam may include one or two questions from first-half material.

Your participation grade is based on a variety of activities. During class I will often make sure of the Socrative app, so you’ll need to install this on your phones. Participating in Socrative questions and with in-class group activities is required for a decent participation grade; a full grade also includes asking questions either in class or in office hours.

Grading

Your final grade is based on a weighted average of particular assignment categories, with weights shown above. You can estimate your current grade based on your scores and these weights. You may always visit the instructor outside of class to discuss your current standing.

This courses uses a standard grading scale. Assignments and final grades will not be curved except in rare cases when its deemed necessary by the instructor. Percentage grades translate to letter grades as follows:

Score Grade
94–100 A
90–93 A-
88–89 B+
82–87 B
80–81 B-
78–79 C+
72–77 C
70–71 C-
68–69 D+
62–67 D
60–61 D-
0–59 F

You are always welcome to challenge a grade that you feel is unfair or calculated incorrectly. Mistakes made in your favor will never be corrected to lower your grade. Mistakes made not in your favor will be corrected. Basically, after the initial grading your score can only go up as the result of a challenge*.

Workload

The weekly workload for this course will vary by student and over the semester, but on average should be about 12 hours per week. The follow table provides a rough estimate of the distribution of this time over different course components for a 16 week semester.

Category Total Time Time/Week (Hours)
Lectures 55 2.5
Homework 72 4.5
Exam Study 27 1.5
Reading+Unstructured Study   2.5
    11

Schedule

The following tentative calendar should give you a feel for how work is distributed throughout the semester. Assignments and events are listed in the week they are due or when they occur. This calendar is subject to change based on the circumstances of the course.

Date Topic Assignment/Reading
Wed 08/24 (Week 1) Intro and Logistics  
Fri 08/26 ML Landscape Ch. 1 (p. 1–32)
Mon 08/29 (Week 2) Python Libs, NumPy NumPy Tutorial, Hwk 1
Wed 08/31 Pandas Pandas Tutorials (all except time series)
Fri 09/02 Visualization (matplotlib) Matplotlib Basic Usage
Mon 09/05 (Week 3) Regression Case Study (notebook) p. 35–61
Wed 09/07 Model Selection & Validation p. 62–83
Fri 09/09 Preprocessing I Hwk 2
Mon 09/12 (Week 4) Preprocessing II p. 85–100
Wed 09/14 Imputation; Binary Classification Evaluation p. 100–109
Fri 09/16 Multiclass Classification  
Mon 09/19 (Week 5) Calibration & Imbalanced Data  
Wed 09/21 Calculus Review 3blue1brown (Ch. 1-5)
Fri 09/23 Basic Linear Algebra 3blue1brown (All but ch. 12), Hwk 3
Mon 09/26 (Week 6) Training Models p. 111–128, Hwk 3 Leaderboard
Wed 09/28 Feature Engineering (notebook)  
Fri 09/30 Complexity & Regularization p. 128–142
Mon 10/03 (Week 7) (Class Cancelled) p. 142–151
Wed 10/05 Logistic Regression and SVMs p. 153–164
Fri 10/07 SVM Kernels p. 164–174
Mon 10/10 (Week 8) Midterm  
(Wed 10/12) (Fall Break)  
(Fri 10/14) (Fall Break)  
Mon 10/17 (Week 9) Midterm Solutions  
Wed 10/19 Decision Trees p. 175–187
Fri 10/21 Ensembles p. 189–211
Mon 10/24 (Week 10) Hwk3 Questions  
Wed 10/26 (No class – work on Hwk3)  
Fri 10/28 Model Interpretation & Feature Selection  
Mon 10/31 (Week 11) Dimensionality Reduction Ch. 8
Wed 11/02 Clustering  
Fri 11/04 Mixture Models  
Mon 11/07 (Week 12) (No class)  
Wed 11/09 NMF and Outlier Detection  
Fri 11/11 Working with Text  
Mon 11/14 (Week 13) More Text: Topic Models Hwk 4
Wed 11/16 LDA Final Project
Fri 11/18 Word and Document Embeddings  
Mon 11/21 (Week 14) Hwk 3 Review  
(Wed 11/23) (Thanksgiving Break)  
(Fri 11/25) (Thanksgiving Break)  
Mon 11/28 (Week 15) Neural Networks Ch. 10
Wed 11/30    
Fri 12/02 Training Neural Networks Ch. 11, Hwk 5
Mon 12/05 (Week 16) Transfer Learning; RNNs  
Wed 12/07 Transformers  
Sat 12/10 3:00 PM Final Exam  

Monmouth College Services