The Challenge
Our platform allows users to automate actions on third-party websites through browser automation. Each target website has its own rate limits and lockout policies. We need to:
- Respect per-site rate limits to avoid getting user accounts locked
- Fairly distribute quota across all users accessing a given site
- Coordinate limits across multiple regions in real-time
- Handle failure modes gracefully without compromising limits
The naive approach of using a centralized rate limiter wouldn't work due to latency between regions. Having independent rate limiters per region could lead to globally exceeding quotas. We needed a system that could coordinate globally while making fast local decisions.
Our Solution: Hierarchical Token Buckets
We implemented a hierarchical token bucket system with three levels:
1type RateLimiter struct {
2 // Global bucket coordinated across regions
3 globalBucket *TokenBucket
4 // Per-region bucket for local rate limiting
5 regionalBucket *TokenBucket
6 // Per-user bucket for fair distribution
7 userBuckets map[string]*TokenBucket
8}
Each bucket implements the token bucket algorithm but with different refill rates and coordination mechanisms:
- Global buckets track the overall quota for a site across all regions
- Regional buckets get a portion of the global quota based on traffic patterns
- User buckets ensure fair distribution within a region
The key innovation is how we coordinate the global bucket across regions. Rather than trying to maintain perfect consistency, we use a probabilistic quota allocation system:
1func (r *RateLimiter) acquireToken(userID string) bool {
2 // Check user bucket first (fast local check)
3 if !r.userBuckets[userID].tryAcquire() {
4 return false
5 }
6
7 // Check regional bucket
8 if !r.regionalBucket.tryAcquire() {
9 return false
10 }
11
12 // Probabilistic global check based on region's quota share
13 quotaShare := r.getRegionalQuotaShare()
14 if rand.Float64() < quotaShare {
15 // Skip global coordination for this request
16 return true
17 }
18
19 // Do global coordination for subset of requests
20 return r.globalBucket.tryAcquireWithTimeout(50 * time.Millisecond)
21}
This approach lets us make fast local decisions most of the time while still maintaining global limits probabilistically. The quota share for each region is continuously adjusted based on observed traffic patterns using a gossip protocol between regions.
Handling Edge Cases
One key challenge was handling failure modes gracefully. If global coordination fails, we fall back to conservative local limits:
1func (r *RateLimiter) getEffectiveQuota() int64 {
2 if !r.globalHealthy() {
3 // Fall back to conservative local quota
4 return r.regionalBucket.capacity / 2
5 }
6 return r.regionalBucket.capacity
7}
We also implemented automatic recovery mechanisms for when accounts hit rate limits:
- Exponential backoff when approaching limits
- Circuit breakers to prevent cascading failures
- Automatic quota redistribution when regions fail
Results
This system has been running in production for over 6 months, handling millions of requests per day across 12 regions. Key metrics:
- 99.9th percentile latency overhead < 1ms
- Global quota accuracy within 5% of target
- Zero account lockouts due to rate limit violations
- Automatic recovery from regional failures in < 10 seconds
Lessons Learned
Building a distributed rate limiting system taught us several lessons:
- Perfect consistency isn't always necessary - probabilistic approaches can work well
- Local decision making with periodic global coordination provides a good trade-off
- Conservative fallbacks are critical for reliability
- Monitoring and automatic recovery are as important as the core algorithm
We're continuing to evolve the system as we scale. Future improvements include:
[list-todo]
- ML-based quota prediction and allocation
- Per-site behavioral modeling to optimize limits
- Enhanced fraud detection integration
[cta]
If you're interested in solving similar distributed systems challenges, we're hiring! Check out our careers page.
Note: Thanks to Jane Smith and John Doe for their contributions to this post and the rate limiting system.