Learning outcomes
At the end of the course the student knows and understands:
- the motivation and the components of the Data Mining process;
- the general concepts, technologies and methodologies of Data Warehouse, OLAP and Data Lake, as enabling factors of the Data Mining process;
- the principles and the most relevant use cases of a wide set of Machine Learning algorithms which are used to extract relevant and actionable information from large amounts of data.
At the end of the course the student is able to:
- design the main steps of a Data Mining process
- choose the Machine Learning methods best suited for the process
- evaluate the quality of the result in order to support strategic and operational decisions.
Course contents
Contents
Part 1 - Data Mining
- Introduction to the Data Mining Process
- Architectures of systems with data mining components
- Enterprise Data Warehouse
- Data Lake
- Case studies
Part 2 - Machine Learning
- What is Machine Learning: some history and motivating examples
- Theory of learning
- Supervised vs unsupervised learning
- Classification and regression
- Model Selection, validation and presentation of results
- Regression
- Classification with linear discrimination, decision trees, Bayesian inference, Support Vector Machines, k-nearest neighbors, logistic regression, random forests, adaboost
- Ensemble learning, boosting, bagging
- Association rules and the Apriori algorithm
- Clustering/segmentation with k-means, dbscan, Expectation Maximization, hierarchical methods, kernel methods
- Analysis of case studies
- CRISP-DM methodology
Pre-requisites
- Fundamentals of programming
- Fundamentals of calculus and linear algebra
- Fundamentals of statistics and probabilities
- Useful some general notion on Data Base Management Systems
Lectures
Lecture list
- Lecture 1: Introduction
Exam
The Verification of knowledge is tested through multiple choice questions. The minimum to pass is to answer correctly half + 1 of the questions. The weight of this part is 33%.
The Verification of abilities will be tested in lab with the development of a program for the execution of a Machine Learning task on an assigned data set. The quality of the solution will be evaluated on the basis of the correctness of the approach, the correctness of the solution, the quality of the coding and of the documentation. The minimum to pass is to give a sensible approach and a reasonable coding. The weight of this part is 67%
Reading material
- Introduction to machine learning / Ethem Alpaydin. - 3. ed. - Cambridge : The MIT Press, 2014. - XXII, 613 p. (online version)
- Scikit-Learn, or Python Data Science Handbook
- Shearer C., The CRISP-DM model: the new blueprint for data mining, J Data Warehousing (2000); 5:13—22.