After the course the student can
- describe the workflow of algorithmic data analysis
- recognize the main data analysis tasks (description vs. prediction)
- implement and use basic classification and itemset mining methods
- evaluate the results of a method, diagnose and remedy major shortcomings (over- and underfitting)
Basic classification methods. Model evaluation, Under- and overfitting, bias and variance.
Frequent itemset mining. Support, downward closure, level-wise search.
Python is the programming language used in the course.
|Modes of study
Course meetings: ca. 6 h for exercise sessions and 6 h for quizzes/tutoring sessions
Self-study: ca. 42h, including watching the video lectures, working out the exercise problems and computing assignments
Students can pass the course either by:
1) attending the course meetings, participating actively, submitting exercise solutions regularly, and taking a short standard exam, or
2) studying autonomously and taking a longer general exam.
Additionaly, in both cases, the student is required to complete 2/3 computing assignments.
The course consists of video-lectures with supporting slides, to present the concepts and methods, quizzes, to help the student assimilate the concepts, exercises, to apply the methods step by step on toy examples, computing assignments, to implement the methods and apply them on example datasets, and exam.
Paper-and-pen exercises typically requires to run by hand on a small example the algorithms studied in the lecture. This allows students to properly understand the details of the procedure, paving the way for their implementation in the assignments. Also, worked-out examples can be used as test cases to check the correctness of the implementation.
Study materials are available on Moodle.
Lectures are based primarily on the “Data Mining - The Textbook” by Charu C. Aggarwal, Chapters 4, 5.2, and 10
Must complete required amount of exercises and computing assignments to take the exam.
Must have exam points above threshold to pass.
Final grade is weighted average of exam, exercises and computing assignments points, with comparable weights assigned to these components. The most favorable subset of computing assignments is used.
Probabilistic Inference for Data Science (or equivalent knowledge). Design and Analysis of Algorithms (or equivalent knowledge).
Open to everyone. The aim of this course is to prepare students who did not take ”Introduction to Algorithmic Data Analysis (JAD)” for advanced Algorithmic Data Analysis courses and seminars (e.g. ADA).
Teaching in English