Course “Data Analyst” - course 96,000 rub. from Yandex Workshop, training 7 months, date December 7, 2023.
Miscellaneous / / December 02, 2023
A data analyst extracts meaning from numbers and values: he sees trends, predicts events and helps a company understand customers, optimize processes and grow.
The market needs specialists who can use data usefully. A study by the personnel company Ancor for September 2022 showed that 45% of Russian companies are looking for analysts to join their team.
Skills you will learn on the course
Job title
Analyst, Data Analyst, Data Analyst
Development opportunities: Product Analyst, Marketing Analyst, BI Analyst, Data Science Specialist
Here are the technologies and tools you will use:
Python
Jupyter Notebook
SQL
PostgreSQL
Tableau
A/B tests
Start making money by analyzing
You will start from a junior position, and then only move forward. You will climb the career ladder and grow in value. And one day there will be no price for you.
Complete Data Analytics Course Program
We update it regularly to ensure it meets industry and employer needs.
In other words, you learn only what will definitely be useful in your work.
Free part - 1 week
Free Introduction: Basics of Python and Data Analysis
Learn the basic concepts of data analysis and understand what data analysts and data scientists do.
• Moscow Catnamycs. Displaying data on the screen. CSV files. Working with tables. Heat maps. Multiplying a column by an integer.
• Errors in the code. Syntax errors. Naming errors. Errors when dividing by zero. Errors when importing a module.
• Variables and data types. Variables. Data types. Arithmetic operations with numbers and strings.
• How to make hypotheses. Hypotheses. HADI cycles. Analytical thinking. Reading graphs.
• What data scientists do. Analyst tasks. Clarification of tasks. Decomposition. Project stages.
• Checking conversions. Conversion. Data exploration. Formation of conclusions.
• Payback of advertising campaigns. Column chart. Difference of elements. Indexing in columns.
• Machine learning and Data Science. Training in machine learning. Finding unique values in columns. Logical indexing. Grouping values in a table. Prediction errors.
• Final project. User segmentation.
PythonPandasErrorsSeabornHypothesesConversionVariablesData TypesHeatmaps
1 sprint 3 weeks
Basic Python
Dive deeper into the Python programming language and the Pandas library.
• Variables and data types. Python language. Variables. Displaying data on the screen. Displaying objects on the screen. Error handling, try...except operator. Data types. Data type conversions.
• Lines. Indexes in rows. Line cuts. Operations on strings. String methods. Formatting strings, format() method, f-strings.
• Lists. Indexes in lists. List slices. Adding items to a list. Removing list items. Addition and multiplication of lists. • Sorting lists. Search for items in a list. Splitting a string into a list of strings, concatenating a list of strings into a string.
• For loop. Cycles. Enumeration of elements. Iterating over element indices. Processing list elements using loops: finding the sum and product of elements.
• Nested lists. Looping through nested lists with counting values. Adding elements to nested lists. Sorting nested lists.
• Conditional operator. While loop. Boolean data type. Boolean values. Logical expressions. Compound logical expressions. Conditional statement if...elif...else. Branching. Filtering lists using a conditional operator. While loop.
• Functions. Assignment of functions. Parameters and arguments. Parameters with default values. Positional and named arguments. Returning a result from a function.
• Dictionaries. Keys and values. Searching for a value by key. Adding items to the dictionary. List of dictionaries. Beautiful output of dictionaries.
• Pandas library. Reading csv files. Dataframe. Dataframe constructor. Printing the first and last rows of a dataframe. Indexing in dataframes. Indexing on Series columns.
• Data preprocessing. The GIGO principle. Renaming dataframe columns. Handling missing values. Handling explicit and implicit duplicates.
• Data analysis and presentation of results. Grouping data. Sorting data. Basics of descriptive statistics.
• Jupyter Notebook - a notebook in a cell. Jupyter Notebook interface. Jupyter Notebook shortcuts.
LoopsPythonPandasStringsListsFunctionsDictionariesDataFrameVariablesDataTypesConditional Statement
Project
Compare Yandex Music user data by city and day of the week.
2 sprint 2 weeks
Data preprocessing
Learn to clean data from outliers, omissions and duplicates, as well as convert different data formats.
• Working with passes. Conversion. Cookies. Categorical and quantitative variables. Handling gaps in categorical variables. Handling gaps in quantitative variables. Handling gaps in quantitative variables by category.
• Changing data types. Reading Excel files. Convert Series to numeric type. Number module, abs() method. Working with date and time. Error handling, try...except operator. Merging dataframes, merge() method. Pivot tables.
• Search for duplicates. Search for duplicates, case sensitive.
• Data categorization. Decomposition of tables. Categorization by numerical ranges. Categorize based on multiple values per row.
• Systematic and critical thinking in the work of an analyst. Systems thinking. Causes of data errors. Critical thinking.
PythonPandasGap handlingData processingDuplicate processingData categorization
Project
Analyze data about bank clients and determine the share of creditworthy ones.
3 sprint 2 weeks
Exploratory data analysis
Learn the basics of probability and statistics. Use them to explore the basic properties of data, looking for patterns, distributions and anomalies. Get to know the Matplotlib library. Draw diagrams and practice analyzing graphs.
• First graphs and conclusions. Using Pivot Tables. Bar chart. Distributions. Range diagram.
• Study of data slices. The query() method. Working with date and time. Plotting graphs using the plot() method. Occam's razor.
• Working with multiple data sources. Data slice based on external objects. Adding new columns to a dataframe. Adding data from other dataframes. Renaming columns. Combining tables using the merge() and join() methods.
• Data relationships. Scatterplot. Correlation of variables. Scatterplot matrix.
• Validation of results. Consolidation of groups. Dividing data into groups.
PythonPandasMatplotlibHistogramsData SlicesData AnalysisScatterplotScatterplotData VisualizationDescriptive Statistics
Project
Explore the archive of advertisements for the sale of real estate in St. Petersburg and the Leningrad region.
4 sprint 3 weeks
Statistical data analysis
Learn to analyze relationships in data using statistical methods. Learn what statistical significance and hypotheses are.
• Combinatorics. Combinations. Multiplication rule. Rearrangements. Number of permutations. Placements. Number of placements. Combinations. Number of combinations.
• Probability theory. Experiment. Probability space. Events. Probability. Intersecting and mutually exclusive events. Euler-Venn diagram. Law of large numbers.
• Descriptive statistics. Categorical and quantitative variables. Mode and median. Average value. Dispersion. Standard deviation. Quartiles and percentiles. Range diagram. Column chart. Frequency density. Bar chart.
• Random variables. Discrete random variable. Probability distribution for a discrete random variable. Cumulative function (distribution function) of a discrete random variable. Mathematical expectation of a discrete random variable. Dispersion of a discrete random variable.
• Distributions. Bernoulli's experiment. Binomial experiment. Binomial distribution. Continuous uniform distribution. Normal distribution. Standard normal distribution. CDF and PPF for normal distribution. Poisson distribution. Approximation of one distribution by another.
• Testing hypotheses. General population. Sample. Sampling distribution. Central limit theorem. One-sided and two-sided hypotheses. P-value. Testing one-sided and two-sided hypotheses for one sample. Testing the hypothesis about the equality of the means of two general populations. Testing the hypothesis of equality of means for dependent samples.
ScipyNumpyPythonPandasMatplotlibCombinatoricsDistributionsHypothesis testingProbability theory
Project
Test scooter rental service hypotheses to help grow your business.
Extra Sprint
Probability theory
Remember or recognize the basic terms in probability theory: independent, opposite, incompatible events, etc. Using simple examples and fun problems, you will practice working with numbers and building the logic of solutions.
This is an optional sprint. This means that each student himself chooses one of the options:
• Master an additional sprint of 10 short lessons, brush up on theory and solve problems.
• Open only the block with interview tasks, recall practice without theory.
• Skip the course completely or return to it when there is time and need.
PythonEventsProbabilityBayes' TheoremRandom VariablesProbability TheoryStatistical Data Analysis
5 sprint 1 week
Final project of the first module
Learn how to conduct preliminary data research and formulate and test hypotheses.
ScipyNumpyPythonPandasMatplotlibData analysisHypothesis testingData processing
Project
Find patterns in game sales data.
6 sprint 2 weeks
Basic SQL
Learn the basics of structured query language SQL and relational algebra for working with databases. Get acquainted with the features of working in PostgreSQL, a popular database management system (DBMS). Learn to write queries of varying levels of complexity and translate business problems into SQL. You will work with a database of an online store that specializes in films and music.
• Introduction to databases. Database management systems (DBMS). SQL language. SQL queries. Formatting SQL queries.
• Data slices in SQL. Data types in PostgreSQL. Data type conversion. WHERE clause. Logical operators. Data slices. Operators IN, LIKE, BETWEEN. Working with date and time. Handling missing values. Conditional CASE construct.
• Aggregation functions. Grouping and sorting data. Mathematical operations. Aggregation functions. Grouping data. Sorting data. Filtering by aggregated data, HAVING operator.
• Relationships between tables. Types of table joins. ER diagrams. Renaming fields and tables. Aliases. Merging tables. Types of joins: INNER JOIN, LEFT JOIN, RIGHT JOIN, FULL OUTER JOIN. Alternative types of unions UNION and UNION ALL.
• Subqueries and common table expressions. Subqueries. Subqueries in FROM. Subqueries in WHERE. A combination of joins and subqueries. Common Table Expressions (CTE). Variability of requests.
SQLDBMSPostgreSQLSubqueriesDatabasesSQL queriesFiltering dataSorting dataGrouping dataJoining tablesCommon table expressions
Project
You will write a series of queries of varying complexity to a database that stores data on venture investors, startups and investments in them.
7 sprint 3 weeks
Analysis of business indicators
Learn what metrics are in business. Learn to use tools for data analysis in business: cohort analysis, sales funnel and unit economics.
• Metrics and funnels. Conversion. Funnels. Marketing funnel. Impressions. Clicks. CTR. Product funnel.
• Cohort analysis. User profile. retention rate. Churn rate. Analysis horizon. Visualization of cohort analysis. Retention analysis of random cohorts. Conversion in cohort analysis. Calculating metrics in Python.
• Unit economics. Metrics LTV, CAC, ROI. ARPU, ARPPU. Calculating metrics in Python. Advanced visualization of metrics. Sharey parameter. Moving average.
• Custom metrics. User activity assessment. User session. Anomaly investigation.
MetricsFunnelsConversionUnit economicsCohort analysisProduct metricsMarketing metrics
Project
Based on the data, understand user behavior, as well as analyze customer profitability and advertising ROI to make recommendations for the marketing department.
8 sprint 2 weeks
Advanced SQL
You will take an additional course on working with databases and become even closer to business. Using the SQL language, you will analyze the calculation of the main business metrics that you became acquainted with in the “Business Indicators Analysis” sprint. Consider working with a complex tool like window functions. Learn to change the contents of databases locally, without a simulator, using special client programs and libraries for Python.
• Calculation of business indicators. Data schema. Conversion. LTV. ARPU. ARPPU. ROI. Calculation using SQL.
• Aggregating window functions. OVER expression. PARTITION BY window parameter.
• Window ranking functions. Ranking functions. Window ORDER BY operator. ROW_NUMBER(). RANK(). DENSE_RANK(). NTILE(). Window operators along with ranking functions.
• Window offset functions. Cumulative values. Offset functions. LEAD(). LAG(). Window functions and aliases.
• Cohort analysis. Retention Rate, Churn Rate. LTV.
• Installation and configuration of the database and database client. Database client. Installing PostgreSQL. Installing DBeaver. DBeaver interface. Database creation. Deploying a database dump. Uploading query results. Presentation of query results.
SQLDBMSMetricsPostgreSQLDatabasesSQL queriesWindow functionsCohort analysis
Project
Using Python and SQL, connect to a database, calculate and visualize key metrics in a programming Q&A service system.
9 sprint 2 weeks
Decision making in business
You will learn what A/B testing is and understand in what cases it is used. Learn to design A/B testing and evaluate its results.
• Fundamentals of hypothesis testing in business. Leading metrics. Bases of experiments. Generation of hypotheses. Prioritization of metrics. Choosing a method for conducting an experiment. Qualitative methods for testing hypotheses. Quantitative methods for testing hypotheses. Advantages and disadvantages of A/B tests.
• Prioritization of hypotheses. RICE framework. Reach parameter. Impact parameter. Confidence parameter. Efforts parameter.
• Preparing to conduct an A/B test. A/A test. Type I and II errors. Power of statistical test. Significance of statistical test. Multiple comparisons, methods for reducing the likelihood of error. Calculation of sample size and duration of an A/B test. Graphical analysis of metrics.
• Analysis of A/B test results. Testing the hypothesis of equality of shares. Shapiro-Wilk test to test data normality. Nonparametric statistical tests. Mann-Whitney test. Stability of cumulative metrics. Analysis of outliers and bursts.
• Behavioral algorithms. Facts, emotions, assessments. Explain your point of view.
A/B testingPrioritization of hypothesesPreparing for A/B testingAnalysis of A/B testing resultsAnalysis of A/B testing results
Project
Analyze the results of A/B testing in a large online store.
10 sprint 1 week
Final project of the second module
Learn to test statistical hypotheses using A/B testing and prepare conclusions and recommendations in analytical report format.
Sales funnelA/B testingData processingResearch data analysis
Project
Explore the sales funnel and analyze the results of A/B testing in the mobile application.
11 sprint 2 weeks
How to tell a story with data
You will learn how to correctly present the results of your research using graphs, the most important figures and their correct interpretation. Get to know the Seaborn and Plotly libraries.
• To whom, how, what and why to tell. Presentation of the research result. The narrator's target audience. What and why to tell a data analyst.
• Seaborn Library. The Seaborn library as an extension of the Matplotlib library. jointplot() method. Color ranges. Chart styles. Visualization of distributions.
• Plotly library. Interactive graphs. Line graph. Column chart. Pie chart. Funnel chart.
• Data visualization in geoanalytics. Geoanalytics. Library Folium. Map display. Setting markers with specified coordinates. Creating point clusters. Custom icons for markers. Horoplet.
• Preparing a presentation. Conclusions based on the study. Seasonality and external factors. Absolute and relative values. Simpson's paradox. Principles of constructing presentations. Reports in Jupyter Notebook.
PlotlyFoliumSeabornMatplotlibPresentationGeoanalyticsData visualization
Project
Prepare a market study based on open data about public catering establishments in Moscow, visualize the data obtained.
12 sprint 2 weeks
Building dashboards in Tableau
In this sprint you will work with the Tableau BI system. Learn to connect to data and modify it, build different types of graphs, assemble dashboards and presentations.
• Basics of working with Tableau. BI systems. Tableau. Creating a document. Saving the document. Publication of the document.
• Working with data sources. Data sources. Data merging. Relationship method. Join method. Blend method. Union method. Changing the table format.
• Data types. Basic data types. Measurements. Measures. Working with date and time. Sets. Groups. Options. Changing the format of variables. Variables Measure Names, Measure Values, Count.
• Tables and calculations. Sheet editing interface. Pivot tables. Calculated fields. LOD expressions.
• Filters and sorting. Sorting measures. Sorting dimensions. Nested sorts. Sorting using a parameter. Filters.
• Visualizations. Visualization controls. Heat maps. Pie charts. Column charts. Histograms. Range diagrams. Scatter diagram. Line graphs. Combined graphs. Area charts.
• Special visualizations and tooltips. Cards. Character map. Bubble chart. Tree map. Circle views diagrams. Bullet diagrams. Gantt charts. Measure names and measure values in visualizations. Reverse engineering. Tooltips. Tooltips with visualizations. Threshold values on graphs. Analytical tools in Custom.
• Presentations. Extra options. Study of typical parameters. Creating a presentation.
• Dashboards. Loading and preparing data. Preparing visualizations. Dashboard assembly. Actions. Dashboard demonstration. Publishing a dashboard.
TableauDashboardsBI-toolsBI-toolsData visualization
Project
Research the history of TED conferences and create a dashboard in Tableau based on the data obtained.
Extra Sprint
Machine Learning Basics
Get acquainted with the basics of machine learning and learn about the main tasks of machine learning in business.
PythonPandasSklearnMachine learningMachine learning tasksMachine learning algorithms
Extra Sprint
Practice Python
You will take several laboratory classes with additional tasks in the Python programming language. You will also learn how to extract data from web resources.
You will:
• in the structure of HTML pages and the operation of GET requests,
• learn to write simple regular expressions,
• get to know the API and JSON,
• make several requests to sites and collect data.
JSONPythonREST APIWeb scraping
13 sprint 3 weeks
Graduation project
In the last project, confirm that you have mastered a new profession. Clarify the customer’s task and go through all stages of data analysis. Now there are no lessons or homework - everything is like at a real job.
The final sprint includes project work, A/B testing and SQL tasks, and an additional task. The project contains a statement of the problem, the expected result, a set of data and their description.
The task relates to one of five business areas:
• banks,
• retail,
• games,
• mobile applications,
• e-commerce.
There will be no usual description of steps in the project. You will work through them yourself.
SQ LPython PandasTableau Dashboards Postgre SQL Decomposition A/B testing