5 Anti-Scraping Techniques You May Encounter

7 min readAug 9, 2019

Photo by Ian Schneider on Unsplash

With the advent of big data, people start to obtain data from the Internet for data analysis with the help of web crawlers. There are various ways to make your own crawler: extensions in browsers, python coding with Beautiful Soupor Scrapy, and also data extraction tools like Octoparse.

However, there is always a coding war between spiders and anti-bots. Web developers apply different kinds of anti-scraping techniques to keep their websites from being scraped. In this article, I have listed the five most common anti-scraping techniques and how they can be avoided.


One of the easiest ways for a website to detect web scraping activities is through IP tracking. The website could identify whether the IP is a robot based on its behaviors. when a website finds out that an overwhelming number of requests had been sent from one single IP address periodically or within a short period of time, there is a good chance the IP would be blocked because it is suspected to be a bot. In this case, what really matters for building an anti-scraping crawler is the number and frequency of visits per unit of time. Here are some scenarios you may encounter.

Scenario 1: Making multiple visits within seconds. There’s no way a real human can browse that fast. So, if your crawler sends frequent requests to a website, the website would definitely block the IP for identifying it as a robot.

Solution: Slow down the scraping speed. Setting up a delay time (e.g. “sleep” function) before executing or increasing the waiting time between two steps would always work.

Scenario 2: Visiting a website at the exact same pace. Real human does not repeat the same behavioral patterns over and over again. Some websites monitor the request frequency and if the requests are sent periodically with the exact same pattern, like once per second, the anti-scraping mechanism would very likely be activated.

Solution: Set a random delay time for every step of your crawler. With a random scraping speed, the crawler would behave more like how humans browse a website.

Scenario 3: Some high-level anti-scraping techniques would incorporate complex algorithms to track…


Web scraping at a large scale without coding. Start simple, for free. www.octoparse.com