Data Scientist from scratch to PRO - course RUB 233,640. from SkillFactory, training 24 months, Date August 15, 2023.
Miscellaneous / / November 29, 2023
After the basic course, you will be able to choose a narrower specialization in Data Science - ML Engineer, CV Engineer or NLP Engineer
M.L. Engineer — Machine learning developer
Develop a credit rating prediction model
Solve the problem of classifying spam SMS messages
Develop a system for recommending suitable products when purchasing
Build a model to increase sales in retail business
Create images based on text description using the DALL-E neural network
CV Engineer — Computer vision specialist
Learn to solve all the basic problems in the field of Computer Vision
You will acquire knowledge of the real flow of working with CV models, current approaches and advanced tools necessary for creating CV services
In the final project, create a virtual coach capable of assessing the correctness of exercises on video
NLP Engineer — Natural language processing specialist
Get to know natural language processing
Get an understanding of NLP tasks - classification, summarization and text generation, creating systems for machine translation and question-answering systems
In the final project, you will independently develop tools for automated search of contexts on given topics.
BASE
At this stage, you will learn the basics of programming in Python, learn how to preprocess and analyze data, and also become familiar with the main tasks of a data scientist.
Introduction - 1 week
You will be able to formulate real learning goals for yourself, find out what the value of DS is for business, get acquainted with the main tasks of a data scientist and understand how the development of any DS project.
INTRO-1. How to study effectively - onboarding in training
INTRO-2. Profession overview. Types of problems in Data Science. Stages and approaches to developing a Data Science project
Development design - 5 weeks
You'll learn to work with basic data types using Python and be able to use looping constructs, conditional statements, and functions in your daily work.
PYTHON-1. Python Basics
PYTHON-2. Diving into Data Types
PYTHON-3. Conditional statements
PYTHON-4. Cycles
PYTHON-5. Functions and functional programming
PYTHON-6. Practice
PYTHON-7. Python Style Guide (Bonus)
Basic Mathematics - 7 weeks
MATH-1. Numbers and Expressions
MATH-2. Equations and inequalities
MATH-3. Basic concepts of function theory
MATH-4. Basics of geometry: planimetry, trigonometry and stereometry
MATH-5. Sets, logic and elements of statistics
MATH-6. Combinatorics and basics of probability theory
MATH-7. Problem solving
Working with data - 8 weeks
At this stage, you will master basic data skills: how to prepare, clean, and transform data so that it is suitable for analysis. Speaking of analysis: you will analyze data using the popular libraries Matplotlib, Seaborn, Plotly.
PYTHON-8. Data Science Tools
PYTHON-9. NumPy library
PYTHON-10. Introduction to Pandas
PYTHON-11. Basic techniques for working with data in Pandas
PYTHON-12. Advanced Data Techniques in Pandas
PYTHON-13. Data cleaning
PYTHON-14. Data visualization
PYTHON-15. Principles of OOP in Python and Debugging Code (optional module)
Project 1. Dataset analytics on closed issues
Data loading - 6 weeks
You will be able to download data from different formats and sources. And SQL, a structured query language, will help you with this. You will learn to use aggregate functions, table joins, and complex joins.
PYTHON-16. How to download data from files of different formats
PYTHON-17. Retrieving data from web sources and APIs
SQL-0. Hello SQL!
SQL-1. SQL Basics
SQL-2. Aggregate functions
SQL-3. Joining tables
SQL-4. Complex joins
Project 2. Loading new data. Refining the analysis
Statistical data analysis - 7 weeks
Intelligence Data Analysis (EDA) is what will be your focus. You will become familiar with all stages of such analysis and learn how to conduct it using the libraries Statsmodels, Scikit Learn, Seaborn, Matplotlib, SciPy, Pandas. In addition, you will be able to work on Kaggle, a popular service for participating in competitions.
EDA-1. Introduction to intelligence data analysis. EDA Algorithms and Methods
EDA-2. Mathematical statistics in the context of EDA. Types of features
EDA-3. Feature Engineering
EDA-4. Statistical data analysis in Python
EDA-5. Statistical data analysis in Python. Part 2
EDA-6. Design of experiments
EDA-7. Kaggle platform
Project 2
Introduction to Machine Learning - 9 weeks
You will become familiar with ML libraries for modeling data dependencies. You will be able to train the main types of ML models, perform validation, interpret the results of the work and select important features (feature importance).
ML-1. Machine learning theory
ML-2. Supervised Learning: Regression
ML-3. Supervised Learning: Classification
ML-4. Unsupervised learning: Clustering and dimensionality reduction techniques
ML-5. Data validation and model evaluation
ML-6. Selection and selection of traits
ML-7. Optimizing model hyperparameters
ML-8. ML Cookbook
Project 3. Classification problem
MAIN UNIT
Linear algebra, mathematical analysis, discrete mathematics - it sounds scary, but don’t be scared: we’ll analyze all these subjects and teach you how to work with them! In the second stage, you will dive into mathematics and the basics of machine learning, learn more about DS professions, and, through career guidance, select a second year track of study.
Mathematics and machine learning. Part 1 - 6 weeks
You will be able to solve practical problems using manual calculation and Python (vector and matrix calculations, working with sets, studying functions using differential analysis).
MATH&ML-1. Linear algebra in the context of Linear methods. Part 1
MATH&ML-2. Linear algebra in the context of Linear methods. Part 2
MATH&ML-3. Mathematical analysis in the context of an optimization problem Part 1
MATH&ML-4. Mathematical analysis in the context of an optimization problem. Part 2
MATH&ML-5. Mathematical analysis in the context of an optimization problem. Part 3
Project 4. Regression problem
Mathematics and machine learning. Part 2 - 6 weeks
You will become familiar with the basic concepts of probability theory and mathematical statistics, algorithms clustering, and also learn to evaluate the quality of the clustering performed and present the results in graphical form.
MATH&ML-6. Probability theory in the context of a Naive Bayes classifier
MATH&ML-7. Algorithms based on Decision Trees
MATH&ML-8. Boosting & Stacking
MATH&ML-9. Clustering and dimensionality reduction techniques. Part 1
MATH&ML-10. Clustering and dimensionality reduction techniques. Part 2
Project 5. Ensemble methods
Discrete Mathematics - 4 weeks
MATH&MGU-1 Sets and combinatorics
MATH&MGU-2 Logic
MATH&MGU-3 Graphs. Part 1
MATH&MGU-4 Graphs. Part 2
ML in business - 8 weeks
You will learn to use ML libraries to solve time series problems and recommender systems. You will be able to train an ML model and validate it, as well as create a working prototype and run the model in the web interface. And also gain A/B testing skills so that you can evaluate the model.
MATH&ML-11. Time series. Part 1
MATH&ML-12. Time series. Part 2
MATH&ML-13. Recommender systems. Part 1
MATH&ML-14. Recommender systems. Part 2
PROD-1. Preparing the model for Production
PROD-2. PrototypeStreamlit+Heroku
PROD-3. Business understanding. Case
Project 6. Topic to choose from: Time series or Recommender systems
PRO LEVEL
In the third stage, you will become familiar with one of the machine learning methods - deep learning (DL). And also a full-fledged block of the chosen specialization awaits you: you can master machine learning skills (ML), get acquainted with the routine of CV (computer vision) or improve in NLP*, natural processing language.
Second year of study - 3 specializations to choose from
Career guidance
ML, CV or NLP: at this stage you finally have to make a choice on which path to take next. We will tell you about each specialization and offer you to solve several practical problems to make it easier for you to decide.
Track ML - engineer
In the ML track, you will learn to solve in-depth machine learning problems, master the competencies of a data engineer, and hone your skills in working with Python libraries. You will also learn how to create an MVP (minimum viable version of a product), learn all the intricacies of outputting an ML model to production, and learn how ML engineers work in real life.
Introduction to Deep Learning
Data Engineering Basics
Additional Python and ML chapters
Economic evaluation of effects and MVP development
ML to Production
In-depth study of ML development and graduation project on a chosen topic
Track CV - engineer
On the CV track you will learn to solve computer vision problems such as image classification, segmentation and detection, image generation and stylization, restoration and quality improvement photographs. In addition, you will learn how to roll out neural networks into production.
Introduction to Deep Learning
Data Engineering Basics
Additional Python and ML chapters
Economic evaluation of effects and MVP development
ML to Production
In-depth study of ML development and graduation project on a chosen topic
Track NLP - engineer
During training on the NLP track, you will learn how to solve the main problems of natural language processing, in including classification, summarization and text generation, machine translation and creation of dialogue systems
Introduction to Deep Learning
Neural Network Mathematics for NLP
Hard & Software for solving NLP problems
NLP tasks and algorithms
Neural networks in Production
In-depth study of NLP development and graduation project on a chosen topic
If you choose the CV or ML specialization, you can take the NLP course without mentor support for free.
Deep Learning and Neural Networks
Where are neural networks used? How to train a neural network? What is Deep Learning? You will find out the answers to these questions in the bonus section of DL.
Introduction to Data Engineering
You will learn the difference between the roles of a data scientist and a data engineer, what tools the latter uses in his work, and what tasks he solves on a daily basis. The words “snowflake”, “star” and “lake” will take on new meanings :)