SEO Crawlers Harming Performance: Why Over Active Bots Waste Resources and Damage Sustainability

sustainable website scroll down

TL;DR: Many website owners assume bots behave responsibly, but some SEO crawlers consume vast amounts of bandwidth, energy, and server time. SEO crawlers harming performance is now a problem for thousands of UK businesses, especially when certain tools ignore robots.txt instructions altogether.

This article explains why these bots behave this way, the strain they place on your hosting, the environmental cost, and why commercial crawlers are profiting from data taken from your site without offering anything in return.

SEO crawlers harming performance is a growing issue because some bots request thousands of pages in a short period.

This can slow websites, inflate hosting costs, and burn energy unnecessarily. In the UK many businesses rely on shared hosting, so heavy crawling affects both performance and carbon footprint.

This guide explains how over active crawlers behave, why they sometimes ignore robots.txt, how this leads to environmental waste, and what you can do to protect your site without harming search engine visibility.

 

Why Some SEO Crawlers Overload Servers

Search engines need to crawl the web to index content. Google, Bing, and most reputable engines follow crawling policies carefully. However several commercial SEO tools (Ahrefs, SEMrush, Moz etc) run their own bots and these often behave far less responsibly.

Their purpose is simple. They collect link data, page structure details, and domain level signals. They then monetise this data by selling access through subscriptions or API products. Your website is providing data for free, and in many cases these bots return nothing of value. They cost you bandwidth and energy while generating revenue for someone else.

Ahrefs claim on their website:

The goal of our crawling is to help site owners improve their online presence, while minimizing load on their servers..

We’ll prove why that just isn’t the case & why they don’t respect Robots.txt

Real Example: When robots.txt Has No Effect

QED blocked AhrefsBot in our robots.txt file on 1 November. It crawled again on 2 November, so the instruction was either ignored or processed too slowly to matter.

After we added a server level block using .htaccess, the bot still attempted to crawl again on 4 November, and again after that, according to an Ahrefs employee in an email.

This behaviour shows how voluntary and unreliable robots.txt compliance can be. If a crawler decides it wants data, it often tries to take it regardless of what your site requests.

Crawl Volume From Your Logs

The QED website October logs reveal the imbalance clearly:

    • AhrefsBot: 4,676 requests
    • SemrushBot: 870 requests
    • DotBot: 289 requests

AhrefsBot alone issued nearly as many requests as Google.

Unlike Google, it does not drive traffic, support search visibility, or provide anything that benefits our business. What it does generate is CPU load, energy use, and pressure on our hosting plan.

 

How Heavy Crawling Consumes Server Resources Bandwidth and Transfer Waste

Each crawl request triggers file transfer. Many UK hosting plans still include bandwidth limits or apply fair usage rules.

Over active SEO crawlers can consume gigabytes of transfer without offering any return. This is particularly wasteful for image heavy sites, hospitality businesses with menus and galleries, or any page that involves large media files.

CPU Usage and Server Load

Every request requires the server to generate or serve a page. On shared hosting, this means your site competes with other websites on the same server. Hundreds or thousands of crawler hits in a short window create spikes in CPU use. When this occurs, your hosting provider may throttle your account automatically, resulting in slower speeds for visitors.

Cache Pressure and Rebuilding

Many sites rely on caching layers to serve pre generated pages. Over active crawlers frequently request URLs that are not cached or that vary with parameters. This forces the server to rebuild pages repeatedly. Each rebuild consumes CPU cycles, memory, and electricity. For larger CMS platforms such as Wordpress this can be significant.

When SEO Crawlers Trigger Database Queries

Dynamic pages, such as listings, booking pages, product filters, or search results, often require database queries. A bot that requests these pages repeatedly can generate hundreds of unnecessary database reads. This slows your site and increases energy use on the server cluster.

 

The Environmental Cost of Unnecessary Crawling

Digital services have a measurable carbon footprint. Each server request consumes electricity. Even a small action like serving a cached page uses energy across:

  • The origin server
  • The network infrastructure
  • The CDN or caching layer
  • The visitor’s browser environment

When a crawler makes thousands of unnecessary requests, this consumption multiplies. The environmental impact is not visible to the user, but it is real.

 

Why Aggressive Crawling Is Unsustainable

  1. Server farms require cooling. Extra CPU load increases heat output which in turn increases cooling demand.
  2. Bandwidth requires energy to transfer. Data flowing through networks uses power at every stage.
  3. Rebuilding pages wastes resources. Dynamic content generation creates spikes in energy usage.

Crawlers do not offset this usage. Search engines justify their impact through user value. Commercial SEO bots do not.

If thousands of websites experience this level of unnecessary load, the collective carbon cost is substantial. A crawler that takes data for free is effectively using your resources, your hosting plan, and your energy footprint to fuel its commercial product.

 

Why Some SEO Crawlers Ignore robots.txt

Robots.txt relies on goodwill. There is no enforcement mechanism. The bots that respect it do so because reputation matters.

Google and Bing cannot afford to be seen as disrespectful. Commercial SEO tools have different incentives.

Profit Driven Crawling

Bots like AhrefsBot and SEMrushBot exist to feed paid services. The more data they collect, the more valuable their product becomes.

This creates an incentive to crawl aggressively and to revisit pages often, even when the site owner has explicitly requested otherwise.

Slow or Infrequent robots.txt Refresh

Some SEO bots fetch robots.txt infrequently or process it slowly. If you block them today, the crawler may not update its instructions for several days.

During that time it will keep requesting pages, as AhrefsBot did with us, until we blocked it with .htaccess

Lack of Penalty

Ignoring robots.txt carries no consequence. The crawler does not lose ranking or visibility. It simply collects data until the server forcibly blocks it.

How This Data Is Monetised Without Your Permission

This is a key point. SEO tools gather your site’s structure, links, and technical signals. They package this into:

  • Commercial link index products
  • Backlink databases
  • Domain rating or authority metrics
  • Paid keyword analysis tools
  • Competitor audit features

Your site fuels their platform.

You pay for hosting, processing, and energy use. They pay nothing to collect the data. In many cases they ignore your request not to take it.

The imbalance is striking. You bear the cost while the crawler earns the revenue.

 

What You Can Do To Protect Your Site

1. Maintain a Clear robots.txt

Robots.txt is still the first line of defence. Many crawlers respect it, even if some do not.

2. Use Server Level Blocking

Server rules are far more effective than robots.txt. Your experience supports this. Once the .htaccess block was added, AhrefsBot was prevented from crawling despite further attempts.

Examples include:

  • Deny rules for specific user agents
  • Blocking known bot IP addresses
  • Pattern matching to stop fake user agents

3. Add Rate Limits or WAF Controls

Some hosting providers, including many UK based companies, offer security tools that allow you to limit the number of requests per minute. This stops over active crawlers from hammering your site.

4. Monitor Logs Regularly

Your October logs made the issue clear. Without them, you would not know the scale of the problem. Reviewing logs monthly allows you to identify new bots before they cause wider issues.

5. Avoid Blocking Genuine Search Engines

Always allow Google, Bing, and other reputable search engines. Their crawling is essential for your visibility and they follow crawl policies reliably.

A Practical Example of Resource Waste

Our website received in October:

    • Google: 4795 requests (expected and beneficial)
    • AhrefsBot: 4,676 requests (no value, high load)
    • SemrushBot: 870 requests
    • DotBot (Moz): 289 requests

Google brought customers. The SEO crawlers (Ahrefs, SEMrush & Moz) consumed electricity, CPU time, and bandwidth without providing anything useful. After server level blocking was added, site performance improved and carbon usage declined because unnecessary page generation stopped.

The Solution: .htaccess file

Whilst we have previously supplied a robots.txt file, which is available on the Downloads page, for excessive seo crawlers like ahrefs, then adding the following code to you .htaccess file is the solution

# ———————————————————————-
# Block aggressive SEO crawlers (Ahrefs, Semrush, Moz, Majestic, DotBot)
# ———————————————————————-
RewriteEngine On# Block AhrefsBot
RewriteCond %{HTTP_USER_AGENT} AhrefsBot [NC]
RewriteRule .* – [F,L]

# Block SemrushBot variants
RewriteCond %{HTTP_USER_AGENT} SemrushBot [NC]
RewriteRule .* – [F,L]

# Block Moz / Mozscape
RewriteCond %{HTTP_USER_AGENT} ^Moz [NC]
RewriteRule .* – [F,L]

# Block Majestic SEO bots
RewriteCond %{HTTP_USER_AGENT} MJ12bot [NC]
RewriteRule .* – [F,L]

# Block DotBot
RewriteCond %{HTTP_USER_AGENT} DotBot [NC]
RewriteRule .* – [F,L]

# Optional: block common scrapers that hide under generic names
# RewriteCond %{HTTP_USER_AGENT} ^Java [NC,OR]
# RewriteCond %{HTTP_USER_AGENT} ^Python [NC]
# RewriteRule .* – [F,L]

 

 

Conclusion: Doesn’t look great for AhrefsBot

Aggressive SEO crawlers, like Ahrefs, waste bandwidth, increase server load, and contribute to avoidable environmental damage.

They extract data from your site for free and use it to sell premium SEO tools. Robots.txt is not enough because it depends on voluntary compliance.

If your logs show heavy crawling from bots that provide no value, blocking them at server level is a sensible step.

At QED Web Design we now apply .htaccess level blocks for AhrefsBot and similar crawlers across our entire portfolio. Our AWStats data makes it clear that these bots do not respect robots.txt, despite their claims to the contrary.

seo crawlers ahrefsbot log

Tim Soulo – CMO at Ahrefs refused to comment, but his understudy Ryan Law decided to try and defend his employer’s product, when research threw up this gem.

seo-crawlers-ahrefsbot-via-llm

On LinkedIn,  Law said:

hey, our crawlers respect robots.txt, you can read more here: https://ahrefs.com/robot

the only exception would be if you use Site Audit and you specifically instruct the Site Audit crawler to ignore it on your own website, but again, you can only do that on your own verified sites.

that screenshot looks like ChatGPT, and in this case, not accurate

As a company, QED, doesn’t trust companies like Ahrefs, more so after this incident. They don’t care about the websites or people who own them, whilst their parasitic bots scrape our data.

Your next move should be to review your own server logs, identify which bots are using your resources, and decide which should be blocked to protect performance, cost, and sustainability.

 

Sources

seo-crawlers-harming-website-performance-ahrefsbot

To see the effect of our
content creation,
See our case study
on The SV Group

We created content over a six month period targeting key areas where their business wanted to expand