Assignments

Throughout the course you will be implementing various ML algorithms. Namely, K-Nearest Neighbors, Perceptron, Logistic and Linear Regression, Support Vector Machine, Decision Tree, Deep Neural Network, Value Iteration. See below the implementation guidelines for each of the algorithms.

Instructions

Set up

All the assignments have been prepared using Python 3.6. Please use a package manager such as Anaconda or pip to install the required packages. It is highly recommended that you set up an Anaconda/virtual environment to avoid a Dependency Hell).

  • joblib==1.0.1
  • jupyter==1.0.0
  • matplotlib==3.3.4
  • numpy==1.19.2
  • pandas==1.1.5
  • pip==21.0.1
  • scikit-learn==0.24.2
  • scipy==1.5.3

Tip1: you can quickly install the above packages by: 1. installing Anaconda; 2. running   conda env create --file < path to requirements.txt file >  . The requirements.txt file can be found here.

Tip2: you might find it comfortable to use PyCharm for easy debugging. See this guide for setting and managing a virtual environment in PyCharm.

Help

Questions, comments, and clarifications regarding the assignments should NOT be sent via email to the course staff. Please use the discussion board on Campuswire (class code: 6453) instead.

Submission

When submitting programming assignments, upload the associated .ipynb file (Jupyter notebook) named by your UIN. For example: '123123123.ipynb'. Students auditing the class (not registered) are NOT allowed to upload a submission.

When only submitting a written assignment, upload a single PDF file containing your solution. The file name should be your UIN. For example: '123123123.pdf'. Students auditing the class (not registered) are NOT allowed to upload a submission.




0. Preliminaries (no need to submit answers just review)




Programming assignment 0: Python Warmup (due Monday, Sep 6)

This assignment tests the basics of Python, NumPy, and Pandas. They are the bread and butter for implementing machine learning algorithms. Hence, it is expected that you should be able to complete this assignment without breaking a lot of sweat. The level of questions varies between easy and medium. So, in case you are unfamiliar with NumPy and Pandas, they should provide you with a nice platform to get up to speed.

This assignment can be downloaded from here: Intro to Python. It is in the form of a Jupyter notebook. You can find more information on setting up and running Jupyter notebooks here. It consists of 15 coding questions covering the basics of Python, NumPy, and Pandas. You are required to complete the functions defined in the code blocks following each question. You only need to fill out sections of the code marked "YOUR CODE HERE". There are a few test cases defined for each question which you may use to verify your implementation. Do NOT modify these test cases since we will use them to grade your solutions. Once you've filled out your solutions, submit the notebook on Canvas (see the instructions on Submission) . Do NOT forget to type in your name and UIN at the beginning of the notebook.



Written assignment 1: Linear Algebra and Probability Review, k-Nearest Neighbors (due Wednesday, Sep 15)

This assignment consists of 2 parts. Part 1 covers the basics of linear algebra and probability. Part 2 contains questions on the k-Nearest Neighbors algorithm.

The assignment can be downloaded from here: Written Assignment 1. It is a PDF file. There are blank spaces left out after each question for you to fill in your solutions. You are free to either use LaTeX (download the source .tex file here) or handwrite them as long as they are legible. Refer to the submission instructions for uploading your solutions to Canvas.



Programming assignment 1: Perceptron (due Wednesday, Sep 22)

In this assignment, you'll be coding up the perceptron algorithm from scratch. You are NOT allowed to use machine learning libraries such as scikit learn for this assignment.

The assignment can be downloaded here: Programming Assignment 1. Please follow the instructions in the notebook. Refer to the submission instructions for uploading your solutions to Canvas.



Written assignment 2: Generative models (due Monday, Sep-27)

W2

Refer to the submission instructions for uploading your solutions to Canvas.



Programming assignment 2: Linear and Logistic Regression (due Tuesday, Oct 26)

In this assignment, you'll be coding up linear and logistic regression algorithms from scratch. You are NOT allowed to use machine learning libraries such as scikit learn for this assignment.

The assignment can be downloaded here: Programming Assignment 2. Download the associated datasets here: heights_weights.csv, advertising.csv, banknote_authentication.csv. Place the dataset files in the same directory as the notebook. Please follow the instructions in the notebook. Refer to the submission instructions for uploading your solutions to Canvas.



Programming assignment 3: Support Vector Machines (due Tuesday, Nov 16)

In this assignment, you'll be training support vector machines for classification.

The assignment can be downloaded here: Programming Assignment 3. Download the associated datasets here: satimage_train.csv, satimage_test.csv. Place the dataset files in the same directory as the notebook. Please follow the instructions in the notebook. Refer to the submission instructions for uploading your solutions to Canvas.



Programming assignment 4: Decision Trees (due Thursday, Nov 25 Monday, Nov 29)

In this assignment, you'll be coding up decision trees for classification and regression from scratch. You are NOT allowed to use machine learning libraries such as scikit learn for this assignment.

The assignment can be downloaded here: Programming Assignment 4. Download the associated datasets here: noisy_sin_subsample_2.csv and new_circle_data.csv. Place the dataset files in the same directory as the notebook. Please follow the instructions in the notebook. Refer to the submission instructions for uploading your solutions to Canvas.



Programming assignment 5 + Competition: Deep Learning (due Monday, Dec 13)

In this assignment + competition, you'll be coding up a convolutional neural network from scratch to classify images using PyTorch.

The assignment can be downloaded here: Programming Assignment 5. To install PyTorch and related packages, please follow the instructions in the notebook. Refer to the submission instructions for uploading your solutions to Canvas.

Competition Rules

  • The maximum number of parameters you are allowed to use in your network is 100,000.
  • Final rankings will be based on the test accuracy of your final model on a private test set that we will use for evaluation. This test set is different from the one provided in the assignment.
  • The data distribution of the private test set is very similar to the training and test set provided to you. So, if your model performs well on the provided test set, there is a very high chance it will also perform well on the private test set.

Competition Bonus

  • 1st place: +10 to final grade
  • Places 2-3: +7
  • Places 4-5: +6
  • Places 6-11: +5