KhueApps
Home/Python/Redis-backed Rate Limiting in FastAPI (Async, Fixed Window)

Redis-backed Rate Limiting in FastAPI (Async, Fixed Window)

Last updated: October 07, 2025

Overview

This guide shows how to implement per-client rate limiting in FastAPI using Redis. We use a simple fixed-window algorithm with atomic counters and response headers, suitable for APIs and internal automation tasks.

Why Redis?

  • Fast, single-source of truth across app instances
  • Atomic counters (INCR) with TTL
  • Low operational overhead; easy to containerize

What you get:

  • Per-IP limit with a 60-second window (configurable)
  • X-RateLimit headers and Retry-After on 429
  • Async Redis client using redis-py (redis.asyncio)

Quickstart

  1. Prerequisites
  • Python 3.10+
  • Redis 6+
  1. Install packages
pip install fastapi uvicorn redis
  1. Start Redis (example with Docker)
docker run --rm -p 6379:6379 redis:7-alpine
  1. Save the minimal example below as main.py

  2. Run the app

uvicorn main:app --reload
  1. Test limits
# Send 5 quick requests and observe headers
for i in $(seq 1 5); do curl -i http://127.0.0.1:8000/; done

Minimal working example

from fastapi import FastAPI, Request, Response, Depends, HTTPException
from redis.asyncio import Redis
import time

LIMIT = 100      # max requests
WINDOW = 60      # seconds

app = FastAPI(title="FastAPI + Redis Rate Limiting")

async def get_redis() -> Redis:
    # Single shared client via app.state
    return app.state.redis

@app.on_event("startup")
async def startup():
    app.state.redis = Redis.from_url(
        "redis://localhost:6379/0",
        encoding="utf-8",
        decode_responses=True,
    )

@app.on_event("shutdown")
async def shutdown():
    await app.state.redis.close()

async def get_client_id(request: Request) -> str:
    # Prefer API key if provided; otherwise use client IP
    api_key = request.headers.get("x-api-key")
    if api_key:
        return f"key:{api_key}"

    # If behind a proxy, ensure trusted and use X-Forwarded-For correctly
    # Minimal: first IP in X-Forwarded-For or fallback to client.host
    xff = request.headers.get("x-forwarded-for")
    if xff:
        return f"ip:{xff.split(",")[0].strip()}"
    return f"ip:{request.client.host}"

async def rate_limiter(
    request: Request,
    response: Response,
    redis: Redis = Depends(get_redis),
):
    client_id = await get_client_id(request)
    key = f"rl:{client_id}"

    # Increment count for this window and set expiry on first hit
    count = await redis.incr(key)
    if count == 1:
        await redis.expire(key, WINDOW)

    # TTL can be -1 or -2; clamp to window if not set
    ttl = await redis.ttl(key)
    if ttl is None or ttl < 0:
        ttl = WINDOW

    remaining = max(0, LIMIT - count)

    # Standard headers
    response.headers["X-RateLimit-Limit"] = str(LIMIT)
    response.headers["X-RateLimit-Remaining"] = str(remaining)
    response.headers["X-RateLimit-Reset"] = str(ttl)

    if count > LIMIT:
        # Return 429 with retry hints
        raise HTTPException(
            status_code=429,
            detail="Too Many Requests",
            headers={
                "Retry-After": str(ttl),
                "X-RateLimit-Limit": str(LIMIT),
                "X-RateLimit-Remaining": "0",
                "X-RateLimit-Reset": str(ttl),
            },
        )

@app.get("/", dependencies=[Depends(rate_limiter)])
async def index():
    return {"ok": True, "message": "Hello with rate limiting"}

@app.get("/status", dependencies=[Depends(rate_limiter)])
async def status():
    return {"status": "green"}

How it works

  • Fixed window: we count requests in a WINDOW seconds bucket using a single key rl:{client}.
  • Redis INCR increments the counter atomically; EXPIRE sets the window.
  • When the counter exceeds LIMIT, return 429 with Retry-After and X-RateLimit headers.

Headers used:

  • X-RateLimit-Limit: max requests per window
  • X-RateLimit-Remaining: remaining requests in the current window
  • X-RateLimit-Reset: seconds until the window resets
  • Retry-After: seconds to wait (on 429)

Key design examples:

  • Global per client: rl:ip:1.2.3.4
  • Per route: rl:ip:1.2.3.4:/search (append request.url.path)
  • Per API key: rl:key:abc123

Applying limits per-route or globally

  • Per-route: add dependencies=[Depends(rate_limiter)] to each path operation.
  • Per-router: create an APIRouter with dependencies=[Depends(rate_limiter)] and include it in app.
  • Global middleware: implement a Starlette middleware that calls the limiter for all requests (including non-FastAPI routes).

Testing

  • Burst test
for i in $(seq 1 120); do curl -s -o /dev/null -w "%{http_code}\n" http://127.0.0.1:8000/; done

You should see 200 for the first 100, then 429 until the window resets.

  • Header inspection
curl -i http://127.0.0.1:8000/

Confirm X-RateLimit-* headers change as you cross the threshold.

Pitfalls and edge cases

  • Proxies and real IP: If you’re behind a load balancer, use the trusted client IP (e.g., first hop from X-Forwarded-For) or prefer authenticated identifiers (API keys, user IDs) to avoid IP sharing issues.
  • Expire race on first request: INCR then EXPIRE is typically fine, but in extreme races some keys might miss TTL. For strict guarantees, use a Lua script that does INCR+PEXPIRE atomically.
  • TTL anomalies: Redis TTL can be -1 (no expiry) or -2 (missing). Clamp or reset TTL if needed, as in the example.
  • Hot keys: A single key under heavy load can become a bottleneck. Consider sharding keys (e.g., per-route) or using a sliding window algorithm to smooth bursts.
  • Multiple app instances: That’s expected; Redis coordinates counts across instances. Ensure all instances share the same Redis.
  • IPv6 and formatting: Normalize addresses to avoid multiple representations of the same client.

Performance notes

  • Use a single Redis client per process; avoid creating a client per request.
  • Prefer Lua scripting or pipelines to reduce round-trips when you add features (e.g., multiple counters per request).
  • Keep windows short to limit the number of live keys. Expiring keys reduces memory pressure.
  • Consider Redis Cluster or a managed instance if sustained QPS is high; INCR is O(1) but network latency dominates.
  • Emit headers only after you compute TTL once per request; avoid extra calls when optimizing.

Variations and upgrades

  • Sliding window: Track timestamps in a sorted set per client and trim old entries. More accurate but heavier (memory/CPU).
  • Token bucket: Store tokens per client in Redis; refill at a steady rate via scripts or background tasks; allows short bursts.
  • Per-endpoint limits: Include the route path in the key to limit expensive endpoints more strictly.
  • User vs IP limits: Prefer authenticated identifiers to avoid penalizing NATed users.

Algorithm comparison (quick glance):

AlgorithmAccuracyCostBursts
Fixed windowMediumLowYes
Sliding windowHighMedLess
Token bucketHighMedYes

Small enhancements

  • Add a config class or environment variables for LIMIT/WINDOW.
  • Add an allowlist (e.g., internal health checks) by skipping the limiter for certain client IDs.
  • Add structured logging when 429 is returned.

Tiny FAQ

  • Q: Does this work across multiple FastAPI instances?

    • A: Yes. Redis centralizes the counter, so all instances share the same limits.
  • Q: Can I exempt authenticated users?

    • A: Yes. In rate_limiter, detect the user or API key and skip or raise the threshold for trusted accounts.
  • Q: Why not store the window index in the key?

    • A: Using a single key with TTL is simpler and avoids orphaned keys; both approaches are valid.
  • Q: How do I return JSON on 429 with headers?

    • A: The example uses HTTPException with headers; FastAPI returns JSON with the detail and sets the headers.

Series: Automate boring tasks with Python

Python