Machine learning in practice - course 41,500 rub. from IBS Training Center, training 24 hours, Date November 26, 2023.
Miscellaneous / / December 02, 2023
The course is built around several practical cases containing tables with initial data.
For each case, we go through the full life cycle of a machine learning project:
research, cleaning and preparation of data,
choosing a training method appropriate to the task (linear regression for regression, random forest for classification, K-means and DBSCAN for clustering),
training using the chosen method,
result evaluation,
model optimization,
presentation of the result to the customer.
During the discussion part of the course, we discuss practical problems facing students that can be solved using the methods discussed.
Topics covered:
1. Review of the task (theory – 1 hour)
What problems are solved well by machine learning, and what problems are they trying to solve?
What happens if, instead of a Data Scientist, you hire a non-specialist in the field (just a developer/analyst/manager) with the expectation that they will learn in the process.
2. Preparation, cleaning, data research (theory – 1 hour, practice – 1 hour)
How to understand the source business data (and generally detect any order in it).
Sequence of processing.
What can and should be delegated to domain analysts, and what is best done by the Data Scientist himself.
Priorities for solving a specific problem.
3. Classifiers and Regressors (theory – 2 hours, practice – 2 hours)
Practical section - well-formalized tasks with prepared data.
Difference between tasks (binary/non-binary/probabilistic classification, regression), redistribution of tasks between classes.
Examples of classification of practical problems.
4. Clustering (theory – 1 hour, practice – 2 hours)
Where and how to carry out clustering: data research, checking the problem statement, checking the results.
What cases can be reduced to clustering.
5. Model evaluation (theory – 1 hour, practice – 1 hour)
Business metrics and technical metrics.
Metrics for classification and regression problems, error matrix.
Internal and external metrics of clustering quality.
Cross-validation.
Assessing retraining.
6. Optimization (theory – 5 hours, practice – 3 hours)
What makes one model better than another: parameters, features, ensembles.
Settings management.
Feature selection practice.
Review of tools for finding the best parameters, features and methods.
7. Charts, reports, working with live tasks (theory – 2 hours, practice – 2 hours)
How to clearly explain what is happening: to yourself, to the team, to the client.
More beautiful answers to meaningless questions.
How to present three terabytes of results on one slide.
Semi-automatic tests, which process control points are really needed.
From live tasks to a full R&D process (“R&D in practice”) - analysis and analysis of tasks from the audience.