The aim of this lecture is to learn efficient algorithms that are used for processing large datasets. We will study scalable approaches to some of the fundamental problems involving finding similar items, clustering, recmmender systems and graph mining. The lecture has both theoretical and programmatic aspects. Students will be exposed to large distributed data processing frameworks like Hadoop and Spark.
Each lecture will be accompanied by a set of exercise questions that students should complete before the next lecture. Exercises must be handed in by the Tuesday before the next lecture in order to be evaluated. Exercises can be either scanned and emailed or delivered by hand. Please do not take a picture of the assignment using a smartphone and consider it a scan.
After every lecture we will have an hour long tutorial session where students will be asked to present solutions from the previous lecture's exercise questions. All students are expected to present in the session and complete the exercises in time.
|2||18.04.2018||Finding Similar Items||Lecture Notes||Assignment 1||Textbook reference|
|3||25.04.2018||Map Reduce||Lecture Notes and Code||Assignment 2|
|4||02.05.2018||Streaming||Lecture Notes||Assignment 3|
|5||09.05.2018||Streaming||Lecture Notes||Assignment 3|