Redis-backed Rate Limiting in FastAPI (Async, Fixed Window)

Overview
Quickstart
Minimal working example
How it works
Applying limits per-route or globally
Testing
Pitfalls and edge cases
Performance notes
Variations and upgrades
Small enhancements
Tiny FAQ

Overview

This guide shows how to implement per-client rate limiting in FastAPI using Redis. We use a simple fixed-window algorithm with atomic counters and response headers, suitable for APIs and internal automation tasks.

Why Redis?

Fast, single-source of truth across app instances
Atomic counters (INCR) with TTL
Low operational overhead; easy to containerize

What you get:

Per-IP limit with a 60-second window (configurable)
X-RateLimit headers and Retry-After on 429
Async Redis client using redis-py (redis.asyncio)

Quickstart

Prerequisites

Python 3.10+
Redis 6+

Install packages

pip install fastapi uvicorn redis

Start Redis (example with Docker)

docker run --rm -p 6379:6379 redis:7-alpine

Save the minimal example below as main.py
Run the app

uvicorn main:app --reload

Test limits

# Send 5 quick requests and observe headers
for i in $(seq 1 5); do curl -i http://127.0.0.1:8000/; done

Minimal working example

from fastapi import FastAPI, Request, Response, Depends, HTTPException
from redis.asyncio import Redis
import time

LIMIT = 100      # max requests
WINDOW = 60      # seconds

app = FastAPI(title="FastAPI + Redis Rate Limiting")

async def get_redis() -> Redis:
    # Single shared client via app.state
    return app.state.redis

@app.on_event("startup")
async def startup():
    app.state.redis = Redis.from_url(
        "redis://localhost:6379/0",
        encoding="utf-8",
        decode_responses=True,
    )

@app.on_event("shutdown")
async def shutdown():
    await app.state.redis.close()

async def get_client_id(request: Request) -> str:
    # Prefer API key if provided; otherwise use client IP
    api_key = request.headers.get("x-api-key")
    if api_key:
        return f"key:{api_key}"

    # If behind a proxy, ensure trusted and use X-Forwarded-For correctly
    # Minimal: first IP in X-Forwarded-For or fallback to client.host
    xff = request.headers.get("x-forwarded-for")
    if xff:
        return f"ip:{xff.split(",")[0].strip()}"
    return f"ip:{request.client.host}"

async def rate_limiter(
    request: Request,
    response: Response,
    redis: Redis = Depends(get_redis),
):
    client_id = await get_client_id(request)
    key = f"rl:{client_id}"

    # Increment count for this window and set expiry on first hit
    count = await redis.incr(key)
    if count == 1:
        await redis.expire(key, WINDOW)

    # TTL can be -1 or -2; clamp to window if not set
    ttl = await redis.ttl(key)
    if ttl is None or ttl < 0:
        ttl = WINDOW

    remaining = max(0, LIMIT - count)

    # Standard headers
    response.headers["X-RateLimit-Limit"] = str(LIMIT)
    response.headers["X-RateLimit-Remaining"] = str(remaining)
    response.headers["X-RateLimit-Reset"] = str(ttl)

    if count > LIMIT:
        # Return 429 with retry hints
        raise HTTPException(
            status_code=429,
            detail="Too Many Requests",
            headers={
                "Retry-After": str(ttl),
                "X-RateLimit-Limit": str(LIMIT),
                "X-RateLimit-Remaining": "0",
                "X-RateLimit-Reset": str(ttl),
            },
        )

@app.get("/", dependencies=[Depends(rate_limiter)])
async def index():
    return {"ok": True, "message": "Hello with rate limiting"}

@app.get("/status", dependencies=[Depends(rate_limiter)])
async def status():
    return {"status": "green"}

How it works

Fixed window: we count requests in a WINDOW seconds bucket using a single key rl:{client}.
Redis INCR increments the counter atomically; EXPIRE sets the window.
When the counter exceeds LIMIT, return 429 with Retry-After and X-RateLimit headers.

Headers used:

X-RateLimit-Limit: max requests per window
X-RateLimit-Remaining: remaining requests in the current window
X-RateLimit-Reset: seconds until the window resets
Retry-After: seconds to wait (on 429)

Key design examples:

Global per client: rl:ip:1.2.3.4
Per route: rl:ip:1.2.3.4:/search (append request.url.path)
Per API key: rl:key:abc123

Applying limits per-route or globally

Per-route: add dependencies=[Depends(rate_limiter)] to each path operation.
Per-router: create an APIRouter with dependencies=[Depends(rate_limiter)] and include it in app.
Global middleware: implement a Starlette middleware that calls the limiter for all requests (including non-FastAPI routes).

Testing

Burst test

for i in $(seq 1 120); do curl -s -o /dev/null -w "%{http_code}\n" http://127.0.0.1:8000/; done

You should see 200 for the first 100, then 429 until the window resets.

Header inspection

curl -i http://127.0.0.1:8000/

Confirm X-RateLimit-* headers change as you cross the threshold.

Pitfalls and edge cases

Proxies and real IP: If you’re behind a load balancer, use the trusted client IP (e.g., first hop from X-Forwarded-For) or prefer authenticated identifiers (API keys, user IDs) to avoid IP sharing issues.
Expire race on first request: INCR then EXPIRE is typically fine, but in extreme races some keys might miss TTL. For strict guarantees, use a Lua script that does INCR+PEXPIRE atomically.
TTL anomalies: Redis TTL can be -1 (no expiry) or -2 (missing). Clamp or reset TTL if needed, as in the example.
Hot keys: A single key under heavy load can become a bottleneck. Consider sharding keys (e.g., per-route) or using a sliding window algorithm to smooth bursts.
Multiple app instances: That’s expected; Redis coordinates counts across instances. Ensure all instances share the same Redis.
IPv6 and formatting: Normalize addresses to avoid multiple representations of the same client.

Performance notes

Use a single Redis client per process; avoid creating a client per request.
Prefer Lua scripting or pipelines to reduce round-trips when you add features (e.g., multiple counters per request).
Keep windows short to limit the number of live keys. Expiring keys reduces memory pressure.
Consider Redis Cluster or a managed instance if sustained QPS is high; INCR is O(1) but network latency dominates.
Emit headers only after you compute TTL once per request; avoid extra calls when optimizing.

Variations and upgrades

Sliding window: Track timestamps in a sorted set per client and trim old entries. More accurate but heavier (memory/CPU).
Token bucket: Store tokens per client in Redis; refill at a steady rate via scripts or background tasks; allows short bursts.
Per-endpoint limits: Include the route path in the key to limit expensive endpoints more strictly.
User vs IP limits: Prefer authenticated identifiers to avoid penalizing NATed users.

Algorithm comparison (quick glance):

Algorithm	Accuracy	Cost	Bursts
Fixed window	Medium	Low	Yes
Sliding window	High	Med	Less
Token bucket	High	Med	Yes

Small enhancements

Add a config class or environment variables for LIMIT/WINDOW.
Add an allowlist (e.g., internal health checks) by skipping the limiter for certain client IDs.
Add structured logging when 429 is returned.

Tiny FAQ

Q: Does this work across multiple FastAPI instances?
- A: Yes. Redis centralizes the counter, so all instances share the same limits.
Q: Can I exempt authenticated users?
- A: Yes. In rate_limiter, detect the user or API key and skip or raise the threshold for trusted accounts.
Q: Why not store the window index in the key?
- A: Using a single key with TTL is simpler and avoids orphaned keys; both approaches are valid.
Q: How do I return JSON on 429 with headers?
- A: The example uses HTTPException with headers; FastAPI returns JSON with the detail and sets the headers.

Series: Automate boring tasks with Python

Python