The aim of this lecture is to learn theoretical and practical aspects of data mining approaches and algorithms when dealing with very large datasets. We will study scalable approaches to some of the fundamental problems involving finding similar items, clustering and graph mining. The lecture has both theoretical and programmatic aspects. Students will be exposed to large distributed data processing frameworks like Hadoop and Spark.
Each lecture will be accompanied by a set of exercise questions that students should complete before the next lecture. Exercises must be handed in by the Tuesday before the next lecture in order to be evaluated. Exercises can be either scanned and emailed or delivered by hand. Please do not take a picture of the assignment using a smartphone and consider it a scan.
After every lecture we will have an hour long tutorial session where students will be asked to present solutions from the previous lecture's exercise questions. All students are expected to present in the session and complete the exercises in time.
|2||14.04.2016||Distributed Data Processing Frameworks||Lecture Notes||Exercise|
|3||21.04.2016||Finding Similar Items||Lecture Notes||Exercise||Solutions|
|4||28.04.2016||Mining Streams - I||Lecture Notes||Exercise||Solutions|
|6||12.05.2016||Mining Streams - II||Lecture Notes||Exercise||Solutions|
|8||02.06.2016||Graph Mining - I||Lecture Notes||Exercise||Solutions|
|9||09.06.2016||Graph Mining - II (No exercise session)||Lecture Notes||Exercise||Solutions|
|10||16.06.2016||Assignment Presentations||Lecture Notes|