Data analysis in Python and databases (SQL) - free course from RANEPA, training, Date: March 6, 2023.
Miscellaneous / / December 04, 2023
The Python Data Analysis and Database (SQL) program will provide students with an understanding of the fundamentals of programming ML (Machine Learning) algorithms using the Python language. It will also help you master a number of competencies in the field of managing modern relational databases and acquire practical skills in using DBMS language tools (Database Management Systems data).
In business and government, huge amounts of data need to be constantly analyzed to obtain information to control the quality of work, put forward fresh ideas and justify accepted ones decisions. Experts who understand massive amounts of data can expect fast-track careers with impressive salaries.
As a rule, SQL and Python are most often mentioned in job advertisements related to data analysis.
Only students who are not studying in the following areas can apply for the course:
During the course of mastering the program, students will have the opportunity to come up with new digital products (independently or in a team). The authors of the best solutions will receive the support of the Academy and the opportunity to collaborate with the Artificial Intelligence Laboratory and the Platform Development Laboratory of RANEPA.
Python
How does it all work? Python Features.
Why is everyone using Python? Comparison with other programming languages. How to write programs? Pros of Python. How to use Python in tasks: word processing, working with images, writing stock robots, launching chat bots. Input and output in Python. Names in Python. Basic operations. Control of program execution progress (conditional constructs). Examples of simple programs.
Cycles. Lines. String methods.
Loop with precondition. Break and Continue statements. For loop Line structure and line length. The line element and how to cut lines. Getting a substring from a string. Line traversal. Sections.
Tuples. Lists. function map. Random value list generator.
Why are tuples needed? Working with a list. List elements. Print the list. Correct copying of lists. Features of using the map function. What are random value generators? A bunch of. Operations with sets. Frozenset. Dictionaries. get method Dictionary traversal.
Functions. Application of mathematical functions. Scope and exclusion.
How to write functions correctly. Simple algorithms. Writing the first functions. How to add math. Recommendations for writing your own functions. How scopes work. Global variables. Returning multiple values from a function. Exceptions.
Working with files. Matrices. Types of sorting
What can you do with the files? Opening and closing files. Reading and writing text files. Matrices. Filling out matrices. Entering matrices row by row. Reading and writing matrices. Bubble sort. Analysis of the algorithm. Several versions of the solution.
Overview of the NumPy library. Working with NumPy functions. SciPY library.
Python libraries and their installation. Basic NumPy data types. Numerical functions. Working with arrays. Operations with arrays. Two-dimensional arrays. Matrices. Basic features of the SciPy library.
Pandas library overview. Basic analytics.
Installing the library. Creating a Series object. DataFrame indexing. Tables. Reading and writing files. Indexing. Data sampling. Operations with rows and columns. Working with NaN. Sorting.
Data visualizations. Practical examples
Studying new data, preprocessing, data visualization, searching for connections between features, preparing training data and creating credit risk forecasting models. Obtaining a model that answers the question: to issue or not to issue a loan.
Automation of routine tasks. Practical examples
Analysis of practical examples of automation of tasks related to obtaining aggregated information from several sources (Excel files) or the Internet.
Review of machine learning methods used for data analysis. Practical examples
A practical example of identifying basic features and their interpretation in a supervised learning problem.
Interim module assessment
Interim certification takes place in the form of a test (computer testing). Sample of a typical test building:
What Python function can you use to find out the type of a variable?
- hwoami
- type
- typeof
- is
What range of numbers will be created in the following code: range (9, 3, -2)
- 9 8 7 6 5 4 4
- 9 7 5 3
- 9 7 5
- 11 9 8 7 6 5 4
What does the code snippet in the listing allow you to execute?
- Code snippet: print(math.sqrt(2))
- displaying the number 2 from the math library
- connecting the math library
- squaring a math number
- displaying the square root of two
SQL
Data storage and integrity.
Data storage: data files, logs, RAID levels, Tempdb. Relationships in databases, one to one, one to many, many to many. Data integrity: PRIMARY KEY, CHECK, UNIQUE and FOREIGN KEY. Normalization of tables.
Creating a database and tables.
DBMS architecture: logical (tables and data type, keys, indexes, views, assemblies, constraints, rules, default values) and physical levels (files and file groups, pages). Database creation. Table field data types: Exact numbers, Unicode character strings, approximate numbers, binary data, date and time, other data types, character strings. Creating tables. Where to write requests?
Indexes. Diagrams.
Indexes: clustered, non-clustered, unique identification, filtered, columnstore, hash, non-clustered, memory optimized index. ER diagrams: Database Diagrams, Database Node. DML and the most common SQL operators: SELECT, INSERT, UPDATE, DELETE. The importance of unique factors.
Requests. External connections.
Queries on one table. Sampling without repetitions. Queries on multiple tables. Outer joins: LEFT, RIGHT or FULL.
Aggregate functions. Subqueries.
Subqueries: WHERE COUNT, SUM, AVG, MAX, MIN, IN, ALL, ANY. Joining data: SELECT JOIN. Aggregate functions: AVG, CHECKSUM_AGG, COUNT, COUNT_BIG, GROUPING, GROUPING_ID, MAX, MIN, STDEV, STDEVP, STRING_AGG, SUM, VAR, VARP. Conditions for aggregate functions.
Useful features.
A little more about operators. Industrial request. Useful functions: SQRT, RAND, CONCAT, numeric and string functions. Operation precedence and type conversion: CAST, CONVERT.
Transactions. Representation.
One request in four ways. Transactions: concept of transactions - atomicity, consistency, isolation, durability, transaction management - COMMIT, ROLLBACK, SAVEPOINT, SET TRANSACTION. Parallel data access: problems of dirty, non-repeatable reading, phantoms. Views: indexed, partitioned, system.
Variables and loops.
Start of programming. Variables: DECLARE, SET, SELECT. Code structure. Data description operators: CREATE, DROP, ALTER, etc. Data manipulation operators: INSERT, DELETE, SELECT, UPDATE, etc. Operators for setting access rights in the database: GRANT / REVOKE, LOCK / UNLOCK, SET LOCK MODE Operators for protection, data recovery and other operators. Cycles: WHILE.
Stored procedures and functions.
Temporary tables. Transact-SQL cursors, server, client. Cursor types: unidirectional, static, Keyset, dynamic. Stored procedures and functions: CREATE PROCEDURE, CREATE PROC.
Triggers. Exceptions.
Conveyor HF. Triggers: AFTER, INSTEADOF. Exceptions: EXCEPT. Dynamic SQL using keyword and using stored procedure: EXECUTE IMMEDIATE.
DBMS. NoSQL databases.
Transactions in programming/DBMS/SQL. How requests are executed. How to improve query performance. NoSQL database and its advantages.
Interim module assessment
Interim certification takes place in the form of a test (computer testing). Sample of a typical test building:
What type of field data is the NUMBER type?
- string
- numeric
- binary
Which SQL command queries data?
- ALTER
- SELECT
- FROM
What is a transaction?
- this is a group of operations performed by the DBMS
- is a group of operations that has ACID properties
- this is the operation of executing an SQL command