Home  /  Methodology  /  Crawl Volume · 2026-04-26
Methodology · Frozen Artifact · 7-Day AI-Bot Crawl Volume

Crawl Volume — Top10Lists.us, 7-Day AI-Bot Window

463,420 AI-bot crawl events across 27 distinct bot fleets in a single 7-day window (2026-04-19 12:22 UTC → 2026-04-26 12:22 UTC). OpenAI 32%, Google 22%, Meta 18%, SEO 13%, Anthropic 5%, Perplexity 4%. Consumer-triggered (UT) ratio: 6.05%.

Measured from Cloudflare edge middleware request logs — the canonical single-source view of every HTTP request reaching the site. Each request classified by bot user-agent signature against an explicit allowlist; rows deduplicated to one entry per (bot, page, second) tuple to remove burst-noise.

Note — Cloudflare Analytics dashboard is NOT canonical

The Cloudflare Analytics dashboard for the same 7-day window reports 378,168undercounts the middleware-derived 463,420 by ~18.4% because the dashboard misses certain ClaudeBot and PerplexityBot variants and conflates sources when compositing multiple views. Middleware single-source is the canonical reference; the dashboard delta is a classification gap, not double-counting.

Frozen: 2026-04-26 12:22 UTC — measurements at this URL will not change.  ·  Permanent dated artifact  ·  GEOlocus.ai

Authors: Robert Maynard, Cofounder and CEO · LinkedIn →  ·  Mark Garland, Cofounder and CRO · LinkedIn →

Download raw receipts (JSON) →

1. Methodology

Crawl-volume is measured from Cloudflare edge middleware request logs — the canonical single-source view of every HTTP request reaching the site. Each request is classified by bot user-agent signature against an explicit allowlist (GPTBot, Meta-ExternalAgent, Googlebot, ClaudeBot, PerplexityBot, ChatGPT-User, OAI-SearchBot, etc. — see receipts.json for full taxonomy).

Rows are deduplicated to one entry per (bot, page, second) tuple to remove burst-noise. In the 7-day window, self-duplicate noise totaled 412 groups / 979 extra rows0.21% of total volume.

Cloudflare Analytics dashboard counts are NOT used as the canonical source; they undercount certain crawler classes (notably ClaudeBot and PerplexityBot variants whose UA strings the dashboard does not classify under the AI-bot category) and conflate sources when the dashboard composites multiple views.

2. 7-Day Window Result — Frozen 2026-04-26 12:22 UTC

Metric Value
Total AI-bot crawl events463,420
Distinct bot fleets27
Self-duplicate noise (groups w/ multiple rows per bot/page/second)412 groups; 979 extra rows; 0.21%

Per-bot top 15 (7-day)

# Bot 7-day crawls
1GPTBot135,950
2Meta-ExternalAgent82,469
3Googlebot72,553
4SEMrushBot49,316
5GoogleOther30,927
6ClaudeBot23,012
7PerplexityBot16,225
8AhrefsBot10,612
9OAI-SearchBot9,413
10Bingbot6,786
11unknown_bot6,281
12ByteSpider4,877
13PetalBot4,411
14TikTokSpider2,858
15ChatGPT-User2,384

By provider

Provider 7-day crawls Share
OpenAI (GPTBot + OAI-SearchBot + ChatGPT-User)147,74732%
Google (Googlebot + GoogleOther)103,48022%
Meta (Meta-ExternalAgent)82,46918%
SEO (SEMrushBot + AhrefsBot)59,92813%
Anthropic (ClaudeBot)23,0125%
Perplexity (PerplexityBot)16,2254%
Other (Bingbot, ByteSpider, PetalBot, TikTokSpider, unknown_bot, …)30,5596%
Total463,420100%

All 27 distinct bot fleets in the window are accounted for: six named providers plus the “Other” catch-all (12 long-tail fleets) sum to the canonical 463,420 (100%). Full component breakdown of the “Other” bucket in receipts.json.

3. Consumer-Triggered (UT) Breakdown

UT (User-Triggered) crawls are requests originating from a human asking an AI assistant a question — distinct from background-indexing crawls. The UT ratio measures how much of the site's bot traffic is downstream of real consumer intent.

Bot 7-day crawls
PerplexityBot16,225
OAI-SearchBot9,413
ChatGPT-User2,384
YouBot12
Total intent28,034

Claude-User, Claude-Web, DuckAssistBot, and Gemini-User registered zero hits in this window.

UT ratio = 28,034 / 463,420 = 6.05%

Roughly 1 in 16 of the AI-bot crawl events on Top10Lists.us in this window were directly downstream of a human asking an AI assistant a question.

4. Reproducibility

Canonical SQL against the middleware crawl log table:

-- Canonical single-source middleware crawl count (7-day window)
SELECT bot,
       COUNT(DISTINCT (bot, path, date_trunc('second', timestamp))) AS crawls
FROM bot_crawl_logs
WHERE source = 'middleware'
  AND timestamp >= '2026-04-19 12:22 UTC'
  AND timestamp <  '2026-04-26 12:22 UTC'
GROUP BY bot
ORDER BY crawls DESC;

The COUNT(DISTINCT (bot, path, date_trunc('second', timestamp))) clause is the dedup. Replace the window literals with any 7-day span to reproduce on a rolling basis. The same query without GROUP BY bot returns the total of 463,420 for this window.

Cloudflare Analytics dashboard count for the same window: 378,168. This undercounts the middleware-derived 463,420 because the analytics view misses certain ClaudeBot and PerplexityBot variants and conflates sources when compositing views. Middleware single-source is canonical.

Source 7-day count Delta vs canonical
Middleware (canonical)463,420
Cloudflare Analytics dashboard378,168-85,252 (-18.4%)

5. Receipts

Full per-bot raw data, provider rollups, the consumer-triggered (UT) breakdown, the canonical SQL, and the dashboard reconciliation are in the downloadable receipts:

/methodology/crawl-volume/receipts.json →

Schema: metric, page, frozen_date, frozen_at_utc, window_start_utc, window_end_utc, canonical_source, totals, per_bot_top_15, by_provider, consumer_triggered_ut, reconciliation, canonical_sql, bot_taxonomy_full, limitations.

6. Limitations

  1. Single 7-day window. Volume varies week-to-week with seasonal AI-engine indexing cycles. The window is frozen so this URL remains an immutable artifact; refreshes publish at new dated URLs.
  2. UA-based classification. Spoofed UAs are not currently filtered (estimated <0.5% of total based on reverse-DNS spot checks of claimed Googlebot / GPTBot rows).
  3. Per-second dedup boundary. Sub-second bursts that spread across the second boundary may double-count (estimated <0.1%); the date_trunc('second', timestamp) dedup catches the dominant burst-noise pattern.
  4. Dashboard discrepancy is a classification gap. The 18.4% delta between middleware and Cloudflare Analytics is the dashboard missing certain UA strings, not the middleware double-counting. Per-row comparison confirms middleware sees more bots, not the same bots more times.

Conclusion

In a single 7-day window (2026-04-19 to 2026-04-26 UTC), Top10Lists.us received 463,420 AI-bot crawl events across 27 distinct bot fleets — with OpenAI (32%), Google (22%), and Meta (18%) leading the mix, and a consumer-triggered (UT) ratio of 6.05%. The canonical reference is the Cloudflare edge middleware request log; the Cloudflare Analytics dashboard for the same window reports 378,168 because it misses certain ClaudeBot and PerplexityBot variants. Full per-bot data and the dashboard reconciliation are in receipts.json.

Related