Whitelisting IP Address

ivan · August 30, 2021, 7:24am

Hi,

I am trying to crawl the website with Screaming Frog SEO Spider, but most of the URLs get 0 response code (connection timeout).

It is probably due to delay in robots.txt file. Can you whitelist 2 of my IP Addresses so I can preform a crawl?

Thanks

donogh · August 30, 2021, 8:53am

Hi Ivan,

It’s most likely your crawl rate is too high and you are triggering our anti-DDoS protection.

The bot needs to respect the standard crawl-delay set in robots.txt for all sites.

Even then, I would suggest maybe crawling at half that speed, and ensuring that the bot does not break the limit not just in terms of rate but also in terms of simultaneous requests.

To get your IP unblocked, you will need to open a ticket, and the support team will provide a standard agreement you need to consent to in order to have crawling access re-enabled.

Bear in mind the entire system is automated and we cannot allow exceptions for platform-wide performance reasons. If the crawl limits are exceeded again, even if you previously had access reinstated, you will be automatically blocked again.

Kind regards,
Donogh

ivan · August 30, 2021, 9:48am

Thanks for Your response Donogh,

In the robots.txt file crawl delay is specified to be 5 (seconds). Issue is that the website has well over 9,000 URLs and that essentially makes the website impossible to crawl.

Is there any workaround for the sites of this size to be crawled in some reasonable time frame?

Ivan

donogh · August 30, 2021, 10:01am

Hi Ivan,

Google and Facebook happily obey the crawl delay, so, with respect, I don’t see any issue here. The same is also true for much larger sites, with upwards of 50,000 items.

Allowing intensive crawlers on client sites hugely negatively impacts the performance of the entire SaaS platform. If every client were hammering their sites with crawlers like that, it would literally cost our retailers money.

Therefore, I’m afraid this policy cannot be negotiated.

Can I ask what you are using the crawler for please? We may be able to suggest better options.

Kind regards,
Donogh

ivan · August 30, 2021, 10:38am

Hi Donogh,

thanks for Your answer…

Not sure about the Facebook, but Google does not take crawl delay into consideration, they adjust it based on the server reaction.

I will figure something out with respect to getting the data about each individual page.

Thanks for Your time and prompt responses…

Regards,
Ivan

donogh · August 30, 2021, 11:04am

Sounds good. You’re very welcome. Please feel free to open a ticket if you need more specific assistance.

Topic		Replies	Views
SEO Tools bots being blocked?	1	1116	May 1, 2018
Robots Text Disallow for int'l Search Engines - SEO WebStore	8	3302	August 21, 2015
Anti DDOS policy denies real customers WebStore	11	1145	July 23, 2019
Ip Blocked by my site	1	637	June 8, 2020
Website Down Monday WebStore	10	1713	November 8, 2018

Whitelisting IP Address

Related topics