WEB Parsing in Python - course 4350 rub. from Stepik, training 63 lessons, Date October 29, 2023.
Miscellaneous / / December 04, 2023
Scraping, or as they say in RuNet, data parsing, means the automatic collection of information with subsequent storage, processing and analysis of the data.
With the help of parsers, we can extract gigabytes of data in seconds, around the clock and automatically. Having mastered the skills of parsing, we can collect information from exchanges, parse various resources, articles, and based on them write algorithms for training trading bots.
Your photos, social media account addresses, phone numbers and other contact information will always be at risk of being scraped if carelessly placed on websites.
On freelance exchanges, the lion's share of orders consists of proposals to write parsers. Having mastered a profession that seems complicated at first glance, you can easily earn a couple of hundred evergreens. Agree, this is a nice addition to your main job.
Collecting, processing and classifying information using neural networks. learn to make decisions for us.
Companies can analyze products, prices, discounts from competitors and constantly fight for the attention of customers, stealing information about new products from each other.
Parsing is not always the dark side of the cookie. In my practice, I often encounter quite harmless orders, for example, for parsing reviews or comments. Simply, the person who created the site does not want to fill it out manually, because it is long and tedious. It’s easier to pay $100 for a ready-made base and relieve yourself of monotonous and routine work.
Data scraping is completely legal. The possibilities of this tool, coupled with the analysis and classification of the data obtained, are essentially limitless. You can parse everything, you just need to know how the fascinating world of information, big data, deep learning and neural networks will open up before you. The main thing is not to stop, learn something new, constantly moving forward.
Purpose of this course:
- Introduce you to the basic tools that are used for parsing/scraping;
- Learn to use these tools in practice;
- Show you features that will help you parse any information from a website;
- While taking the course, you will have access to a general chat where you can ask a question if something suddenly becomes unclear;
- And much more.
Introduction
1. Introduction
2. How much can you earn from scraping?
3. Feedback from students
4. Course content
DOM tree HTML
1. Introduction to DOM
2. Elements and their types
3. HTML Attributes
4. Finding elements on a page
Requests
1. Introduction to Requests
2. Installing the requests library
3. requests.get() method
4. Status codes
5. Getting the contents of the response object
6. Conclusion
BeautifulSoup
1. Introduction to BeautifulSoup4
2. Installation and Import
3. Making soup
4. Search for nodes and elements
5. Pagination
6. AJAX parsing
7. Parsing tabular data
8. Save the result in Excel
9. We save the result in JSON
10. Parse JSON
Selenium
1. Introduction
2. Installing Selenium Webdriver
3. Options and Arguments
4. Finding Selenium Elements
5. Selenium Methods
6. Scrolling pages
7. Windows and Tabs
8. Expectations explicit and implicit
Bonus
1. Examples of parsers
Parsim Telegram
1. Introduction
2. Installation, configuration and imports
3. Basic Telethon Methods
4. Parsing data of group members
5. Parsing group messages
6. Send the parsing result to telegram
7. Feedback
Asynchronous parsing
1. Introduction to Asyncio
2. Installation, configuration, imports
3. asyncio start
4. Event loop
5. Awaitable objects
6. Basic Asyncio Methods and Functions
7. aiohttp
8. Cooking asynchronous soup
9. aiofile
Bypass captcha
1. Introduction to CAPTCHA
2. Installation, configuration, imports
3. Bypassing regular captcha
4. Bypass text captcha
5. Bypass reCAPTCHA V2
6. Bypass Invisible reCAPTCHA V2
7. Bypass reCAPTCHA V3
8. Bypass reCAPTCHA Enterpise
9. Bypass Grid
10. Bypass Coordinates
11. Bypass Geetest Geetest v4
12. Bypass hCaptcha
13. Bypass Yandex Smart Captcha
14. Bypass Lemin Cropped Captcha