We're Hiring. Check Out Our Open Positions.

January 27, 2025

Distributed Rate Limiting at the Edge: How Anon Coordinates Global Request Quotas

At Anon, we enable developers to build automation on top of websites that don't have APIs. This presents unique challenges around rate limiting - we need to respect each target site's rate limits while ensuring fair quota distribution across our global user base. This post explores how we built a distributed rate limiting system that coordinates request quotas across multiple regions while preventing account lockouts.

The Challenge

Our platform allows users to automate actions on third-party websites through browser automation. Each target website has its own rate limits and lockout policies. We need to:

  1. Respect per-site rate limits to avoid getting user accounts locked
  2. Fairly distribute quota across all users accessing a given site
  3. Coordinate limits across multiple regions in real-time
  4. Handle failure modes gracefully without compromising limits

The naive approach of using a centralized rate limiter wouldn't work due to latency between regions. Having independent rate limiters per region could lead to globally exceeding quotas. We needed a system that could coordinate globally while making fast local decisions.

Our Solution: Hierarchical Token Buckets

We implemented a hierarchical token bucket system with three levels:

1type RateLimiter struct {
2  // Global bucket coordinated across regions
3  globalBucket *TokenBucket
4  // Per-region bucket for local rate limiting
5  regionalBucket *TokenBucket  
6  // Per-user bucket for fair distribution
7  userBuckets map[string]*TokenBucket
8}

Each bucket implements the token bucket algorithm but with different refill rates and coordination mechanisms:

  1. Global buckets track the overall quota for a site across all regions
  2. Regional buckets get a portion of the global quota based on traffic patterns
  3. User buckets ensure fair distribution within a region

The key innovation is how we coordinate the global bucket across regions. Rather than trying to maintain perfect consistency, we use a probabilistic quota allocation system:

1func (r *RateLimiter) acquireToken(userID string) bool {
2  // Check user bucket first (fast local check)
3  if !r.userBuckets[userID].tryAcquire() {
4    return false
5  }
6  
7  // Check regional bucket
8  if !r.regionalBucket.tryAcquire() {
9    return false
10  }
11  
12  // Probabilistic global check based on region's quota share
13  quotaShare := r.getRegionalQuotaShare()
14  if rand.Float64() < quotaShare {
15    // Skip global coordination for this request
16    return true
17  }
18  
19  // Do global coordination for subset of requests
20  return r.globalBucket.tryAcquireWithTimeout(50 * time.Millisecond)
21}

This approach lets us make fast local decisions most of the time while still maintaining global limits probabilistically. The quota share for each region is continuously adjusted based on observed traffic patterns using a gossip protocol between regions.

Handling Edge Cases

One key challenge was handling failure modes gracefully. If global coordination fails, we fall back to conservative local limits:

1func (r *RateLimiter) getEffectiveQuota() int64 {
2  if !r.globalHealthy() {
3    // Fall back to conservative local quota
4    return r.regionalBucket.capacity / 2
5  }
6  return r.regionalBucket.capacity
7}

We also implemented automatic recovery mechanisms for when accounts hit rate limits:

  • Exponential backoff when approaching limits
  • Circuit breakers to prevent cascading failures
  • Automatic quota redistribution when regions fail

Results

This system has been running in production for over 6 months, handling millions of requests per day across 12 regions. Key metrics:

  • 99.9th percentile latency overhead < 1ms
  • Global quota accuracy within 5% of target
  • Zero account lockouts due to rate limit violations
  • Automatic recovery from regional failures in < 10 seconds

Lessons Learned

Building a distributed rate limiting system taught us several lessons:

  1. Perfect consistency isn't always necessary - probabilistic approaches can work well
  2. Local decision making with periodic global coordination provides a good trade-off
  3. Conservative fallbacks are critical for reliability
  4. Monitoring and automatic recovery are as important as the core algorithm

We're continuing to evolve the system as we scale. Future improvements include:

[list-todo]

  • ML-based quota prediction and allocation
  • Per-site behavioral modeling to optimize limits
  • Enhanced fraud detection integration

[cta]

If you're interested in solving similar distributed systems challenges, we're hiring! Check out our careers page.

Note: Thanks to Jane Smith and John Doe for their contributions to this post and the rate limiting system.

Resilient Browser Automation at Scale: How Anon Solves the Anti-Bot Challenge

Secure Credential Management at Scale: Anon's Zero-Persistence Architecture

Distributed Rate Limiting at the Edge: How Anon Coordinates Global Request Quotas

Stateful Action Replay: Building Robust User Workflow Recording at Anon

Building Reliable Browser Automation Pipelines at Scale

Dynamic Protocol Adaptation: Building a Universal Authentication Layer at Anon

Building Unified Authentication at Anon: A Tale of Provider-Agnostic Session Management

Cross-Site Schema Federation: Building a Unified API Interface Across Diverse Web Platforms