
Complexities of Web Scraping Asynchronous loading and client-side rendering Let's list down these complexities one by one, and see the solutions for them in the next section. Some complexities are easy to get around with, and some aren't. They can deliberately introduce complexities to make the scraping process tricky. The answer to this mostly depends upon the way the site is programmed and the intent of the website owner. Seems like an easy process, right? What could go wrong? We'll use BeautifulSoup for parsing the HTML. Let's write a simple Python function to get this value. A simple Google search leads me to Socialblade's Real-time Youtube Subscriber Count Page.įrom visual inspection, we find that the subscriber count is inside a tag with ID rawCount.

Finally, we use the information for whatever purpose we intended to.įor example, let's say we want to extract the number of subscribers of PewDiePie and compare it with T-series. The following steps involve methodically making requests to the webpage and implementing the logic for extracting the information, using the patterns we identified. The first step involves using built-in browser tools (like Chrome DevTools and Firefox Developer Tools) to locate the information we need on the webpage and identifying structures/patterns to extract it programmatically.
#FMINER RUN CODE ACTION MANUAL#
It can either be a manual process or an automated one.

Web scraping, in simple terms, is the act of extracting data from websites. Please keep in mind the importance of scraping with respect. This article sheds light on some of the obstructions a programmer may face while web scraping, and different ways to get around them. It's like a cat and mouse game between the website owner and the developer operating in a legal gray area.

Scraping is a simple concept in its essence, but it's also tricky at the same time.
