Large Scale Data Mining

Enroll now


  • Exam Results Final grades (including bonus points) can be found here. Each student is assigned a code: immatrikulation nummer % 37317

  • Exam The exam will be held on the 29th of July (14:00 to 16:00). Appelstraße 4, Informatik (Gebäude 3703).

  • Info The first lecture will held in IL2 (15th Floor, Room 1514) at L3S on 07.04.2016.

  • Programming Datasets and instructions can be found here.

Aim of the Lecture..

The aim of this lecture is to learn theoretical and practical aspects of data mining approaches and algorithms when dealing with very large datasets. We will study scalable approaches to some of the fundamental problems involving finding similar items, clustering and graph mining. The lecture has both theoretical and programmatic aspects. Students will be exposed to large distributed data processing frameworks like Hadoop and Spark.


Each lecture will be accompanied by a set of exercise questions that students should complete before the next lecture. Exercises must be handed in by the Tuesday before the next lecture in order to be evaluated. Exercises can be either scanned and emailed or delivered by hand. Please do not take a picture of the assignment using a smartphone and consider it a scan.
After every lecture we will have an hour long tutorial session where students will be asked to present solutions from the previous lecture's exercise questions. All students are expected to present in the session and complete the exercises in time.

  • Students can present solutions in the exercises for grade improvement.
  • Only correct solutions (submitted on time) are eligible for presentations during the exercise session.
  • Every 3 solutions presented results in 0.3 grade improvement in the final exam. The maximum improvement you can get is 1.0 grade points.


# Date Lecture Links
1 07.04.2016 Introduction Lecture Notes
2 14.04.2016 Distributed Data Processing Frameworks Lecture Notes Exercise
3 21.04.2016 Finding Similar Items Lecture Notes Exercise Solutions
4 28.04.2016 Mining Streams - I Lecture Notes Exercise Solutions
5 05.05.2014 No Class
6 12.05.2016 Mining Streams - II Lecture Notes Exercise Solutions
7 26.05.2016 Clustering Lecture Notes Exercise Solutions
8 02.06.2016 Graph Mining - I Lecture Notes Exercise Solutions
9 09.06.2016 Graph Mining - II (No exercise session) Lecture Notes Exercise Solutions
10 16.06.2016 Assignment Presentations Lecture Notes
11 23.06.2016 Conclusions Lecture Notes