Skip to main content
Rate limiting protects your AI infrastructure from excessive usage, runaway scripts, and unexpected cost spikes. Raven enforces rate limits at the virtual key level using Redis-backed counters.

How It Works

Each virtual key can have two independent rate limits:
LimitWindowPurpose
RPM (Requests Per Minute)60 secondsControls burst traffic
RPD (Requests Per Day)86,400 secondsControls total daily usage
Both limits are checked in parallel when a request arrives. If either limit is exceeded, the request is rejected immediately — before any guardrail evaluation or provider call is made.
If neither RPM nor RPD is set on a virtual key, rate limiting is skipped entirely for that key.

Configuring Rate Limits

Set rate limits when creating or updating a virtual key in the dashboard or via the API.
1

Navigate to Keys

Go to Keys in the dashboard sidebar.
2

Create or Edit a Key

Click Create Key or select an existing key to edit.
3

Set RPM and RPD

Enter your desired requests per minute and requests per day values. Leave blank for unlimited.

API Configuration

{
  "name": "production-app",
  "environment": "live",
  "rateLimitRpm": 600,
  "rateLimitRpd": 100000
}

Example Configurations

Use CaseRPMRPDRationale
Development key301,000Light usage for testing
Production key600100,000Typical production workload
Batch processing120500,000High daily volume, moderate burst
Internal tool10500Low-frequency, cost-controlled

Token Bucket via Redis

Rate limits are enforced using the token bucket algorithm backed by Redis via the rate-limiter-flexible library. This provides:
  • Distributed enforcement — Works across multiple Raven instances sharing the same Redis
  • Sub-millisecond latency — Redis operations add minimal overhead to each request
  • Atomic operations — No race conditions under high concurrency
  • Automatic expiry — Counters reset naturally when the time window elapses

How Counters Work

RPM counter:  rl:rpm:{keyId}  --> points: {rpm}, duration: 60s
RPD counter:  rl:rpd:{keyId}  --> points: {rpd}, duration: 86400s
Each request consumes one point from the bucket. When the bucket is empty, subsequent requests are rejected until the window elapses and points are replenished.

429 Responses

When a rate limit is exceeded, Raven returns a 429 Too Many Requests response:
{
  "error": {
    "message": "Rate limit exceeded (requests per minute)",
    "code": "RATE_LIMITED"
  }
}
Or for daily limits:
{
  "error": {
    "message": "Rate limit exceeded (requests per day)",
    "code": "RATE_LIMITED"
  }
}
When you receive a 429, wait before retrying. For RPM limits, waiting a few seconds is usually sufficient. For RPD limits, the counter resets 24 hours after the first request in the current window.

Rate Limits vs. Budgets

Rate limits and budgets serve different purposes:
FeatureRate LimitsBudgets
UnitRequest countDollar amount
ScopePer virtual keyOrg, team, or key
WindowMinute / dayDay / week / month
PurposeThroughput controlCost control
Use rate limits to prevent traffic spikes. Use budgets to prevent cost overruns.

Monitoring Rate Limits

Rate limit events are tracked in Raven’s observability layer:
  • Prometheusraven_rate_limit_exceeded_total counter with key_id label
  • Eventskey.rate_limited events are emitted and available via the event stream
  • Analytics — Rate-limited requests appear in the dashboard analytics with a 429 status code

Best Practices

RPM alone does not prevent a key from being used all day at a moderate rate. Combine RPM for burst protection with RPD for daily caps.
Give each application or team its own virtual key with appropriate limits. This prevents one workload from starving another.
Begin with lower limits and increase them based on observed usage patterns in the analytics dashboard.
Implement exponential backoff in your client code. The Raven SDK handles this automatically.