Overview
This guide shows how to implement per-client rate limiting in FastAPI using Redis. We use a simple fixed-window algorithm with atomic counters and response headers, suitable for APIs and internal automation tasks.
Why Redis?
- Fast, single-source of truth across app instances
- Atomic counters (INCR) with TTL
- Low operational overhead; easy to containerize
What you get:
- Per-IP limit with a 60-second window (configurable)
- X-RateLimit headers and Retry-After on 429
- Async Redis client using redis-py (redis.asyncio)
Quickstart
- Prerequisites
- Python 3.10+
- Redis 6+
- Install packages
pip install fastapi uvicorn redis
- Start Redis (example with Docker)
docker run --rm -p 6379:6379 redis:7-alpine
Save the minimal example below as main.py
Run the app
uvicorn main:app --reload
- Test limits
# Send 5 quick requests and observe headers
for i in $(seq 1 5); do curl -i http://127.0.0.1:8000/; done
Minimal working example
from fastapi import FastAPI, Request, Response, Depends, HTTPException
from redis.asyncio import Redis
import time
LIMIT = 100 # max requests
WINDOW = 60 # seconds
app = FastAPI(title="FastAPI + Redis Rate Limiting")
async def get_redis() -> Redis:
# Single shared client via app.state
return app.state.redis
@app.on_event("startup")
async def startup():
app.state.redis = Redis.from_url(
"redis://localhost:6379/0",
encoding="utf-8",
decode_responses=True,
)
@app.on_event("shutdown")
async def shutdown():
await app.state.redis.close()
async def get_client_id(request: Request) -> str:
# Prefer API key if provided; otherwise use client IP
api_key = request.headers.get("x-api-key")
if api_key:
return f"key:{api_key}"
# If behind a proxy, ensure trusted and use X-Forwarded-For correctly
# Minimal: first IP in X-Forwarded-For or fallback to client.host
xff = request.headers.get("x-forwarded-for")
if xff:
return f"ip:{xff.split(",")[0].strip()}"
return f"ip:{request.client.host}"
async def rate_limiter(
request: Request,
response: Response,
redis: Redis = Depends(get_redis),
):
client_id = await get_client_id(request)
key = f"rl:{client_id}"
# Increment count for this window and set expiry on first hit
count = await redis.incr(key)
if count == 1:
await redis.expire(key, WINDOW)
# TTL can be -1 or -2; clamp to window if not set
ttl = await redis.ttl(key)
if ttl is None or ttl < 0:
ttl = WINDOW
remaining = max(0, LIMIT - count)
# Standard headers
response.headers["X-RateLimit-Limit"] = str(LIMIT)
response.headers["X-RateLimit-Remaining"] = str(remaining)
response.headers["X-RateLimit-Reset"] = str(ttl)
if count > LIMIT:
# Return 429 with retry hints
raise HTTPException(
status_code=429,
detail="Too Many Requests",
headers={
"Retry-After": str(ttl),
"X-RateLimit-Limit": str(LIMIT),
"X-RateLimit-Remaining": "0",
"X-RateLimit-Reset": str(ttl),
},
)
@app.get("/", dependencies=[Depends(rate_limiter)])
async def index():
return {"ok": True, "message": "Hello with rate limiting"}
@app.get("/status", dependencies=[Depends(rate_limiter)])
async def status():
return {"status": "green"}
How it works
- Fixed window: we count requests in a WINDOW seconds bucket using a single key rl:{client}.
- Redis INCR increments the counter atomically; EXPIRE sets the window.
- When the counter exceeds LIMIT, return 429 with Retry-After and X-RateLimit headers.
Headers used:
- X-RateLimit-Limit: max requests per window
- X-RateLimit-Remaining: remaining requests in the current window
- X-RateLimit-Reset: seconds until the window resets
- Retry-After: seconds to wait (on 429)
Key design examples:
- Global per client: rl:ip:1.2.3.4
- Per route: rl:ip:1.2.3.4:/search (append request.url.path)
- Per API key: rl:key:abc123
Applying limits per-route or globally
- Per-route: add dependencies=[Depends(rate_limiter)] to each path operation.
- Per-router: create an APIRouter with dependencies=[Depends(rate_limiter)] and include it in app.
- Global middleware: implement a Starlette middleware that calls the limiter for all requests (including non-FastAPI routes).
Testing
- Burst test
for i in $(seq 1 120); do curl -s -o /dev/null -w "%{http_code}\n" http://127.0.0.1:8000/; done
You should see 200 for the first 100, then 429 until the window resets.
- Header inspection
curl -i http://127.0.0.1:8000/
Confirm X-RateLimit-* headers change as you cross the threshold.
Pitfalls and edge cases
- Proxies and real IP: If you’re behind a load balancer, use the trusted client IP (e.g., first hop from X-Forwarded-For) or prefer authenticated identifiers (API keys, user IDs) to avoid IP sharing issues.
- Expire race on first request: INCR then EXPIRE is typically fine, but in extreme races some keys might miss TTL. For strict guarantees, use a Lua script that does INCR+PEXPIRE atomically.
- TTL anomalies: Redis TTL can be -1 (no expiry) or -2 (missing). Clamp or reset TTL if needed, as in the example.
- Hot keys: A single key under heavy load can become a bottleneck. Consider sharding keys (e.g., per-route) or using a sliding window algorithm to smooth bursts.
- Multiple app instances: That’s expected; Redis coordinates counts across instances. Ensure all instances share the same Redis.
- IPv6 and formatting: Normalize addresses to avoid multiple representations of the same client.
Performance notes
- Use a single Redis client per process; avoid creating a client per request.
- Prefer Lua scripting or pipelines to reduce round-trips when you add features (e.g., multiple counters per request).
- Keep windows short to limit the number of live keys. Expiring keys reduces memory pressure.
- Consider Redis Cluster or a managed instance if sustained QPS is high; INCR is O(1) but network latency dominates.
- Emit headers only after you compute TTL once per request; avoid extra calls when optimizing.
Variations and upgrades
- Sliding window: Track timestamps in a sorted set per client and trim old entries. More accurate but heavier (memory/CPU).
- Token bucket: Store tokens per client in Redis; refill at a steady rate via scripts or background tasks; allows short bursts.
- Per-endpoint limits: Include the route path in the key to limit expensive endpoints more strictly.
- User vs IP limits: Prefer authenticated identifiers to avoid penalizing NATed users.
Algorithm comparison (quick glance):
Algorithm | Accuracy | Cost | Bursts |
---|---|---|---|
Fixed window | Medium | Low | Yes |
Sliding window | High | Med | Less |
Token bucket | High | Med | Yes |
Small enhancements
- Add a config class or environment variables for LIMIT/WINDOW.
- Add an allowlist (e.g., internal health checks) by skipping the limiter for certain client IDs.
- Add structured logging when 429 is returned.
Tiny FAQ
Q: Does this work across multiple FastAPI instances?
- A: Yes. Redis centralizes the counter, so all instances share the same limits.
Q: Can I exempt authenticated users?
- A: Yes. In rate_limiter, detect the user or API key and skip or raise the threshold for trusted accounts.
Q: Why not store the window index in the key?
- A: Using a single key with TTL is simpler and avoids orphaned keys; both approaches are valid.
Q: How do I return JSON on 429 with headers?
- A: The example uses HTTPException with headers; FastAPI returns JSON with the detail and sets the headers.