Bài 122: Project 3 — Multi-tenant Rate Limit Gateway

1

Mục Tiêu Bài Học

Hiểu bài toán rate limiting multi-tenant trong API gateway production: authenticate API key, áp dụng giới hạn khác nhau theo tier, reject đúng chuẩn HTTP.
Nắm GCRA (Generic Cell Rate Algorithm) và cách cài đặt atomic bằng Lua script trên Redis.
Thiết kế schema Redis cho hệ thống multi-tenant: apikey, tenant config, GCRA state, daily quota, audit stream.
Viết FastAPI middleware xử lý toàn bộ luồng: auth → rate limit → daily quota → audit → proxy.
Biết cách config tiered quota và cơ chế upgrade tier qua Pub/Sub reload config.
Giải thích các failure mode (Redis down: fail open vs. fail closed) và hướng circuit breaker.
Nhận diện các gotcha phổ biến khi triển khai rate limit gateway ở production.

2

Tổng Quan Project 3

Project 3 xây dựng một API gateway multi-tenant đứng trước các upstream service, chịu trách nhiệm:

Xác thực request qua API key (header X-API-Key).
Tra cứu tenant và tier tương ứng.
Áp dụng rate limit per-key theo GCRA — chính xác hơn token bucket cổ điển ở chỗ nó kiểm soát cả khoảng cách tối thiểu giữa các request.
Kiểm tra daily quota tích lũy theo tenant.
Ghi audit log vào Redis Stream để tra cứu sau.
Proxy request hợp lệ tới upstream.

Redis đảm nhiệm toàn bộ state rate limiting và audit. Upstream service không cần biết về rate limit — gateway xử lý hoàn toàn.

Mục tiêu hiệu năng: latency overhead của gateway (không tính upstream) dưới 1ms p99, throughput aggregate 1M+ req/s trên cụm Redis Cluster 3 master + 3 replica.

3

Functional & Non-functional Requirements

Functional requirements

Authenticate bằng API key trong header X-API-Key; từ chối 401 nếu thiếu hoặc không tồn tại.
Áp dụng rate limit per-key: từ chối 429 kèm header Retry-After khi vượt giới hạn.
Tier-based quota: free / paid / enterprise với giới hạn req/s, burst và daily quota khác nhau.
Proxy request hợp lệ tới upstream service.
Ghi audit log mỗi request (timestamp, path, IP) vào Redis Stream, giữ tối đa 1000 entry gần nhất per key.
Hỗ trợ upgrade tier không cần restart gateway (config reload qua Pub/Sub).

Non-functional requirements

Latency overhead của gateway (không tính upstream): < 1ms p99.
Throughput aggregate: 1M+ req/s trên cụm.
Multi-region: mỗi region có Redis Cluster riêng, sync eventual consistent.
Availability: 99.99% (dưới 52 phút downtime/năm).

4

Architecture & Tech Stack

Client ──► CDN (anycast) ──► Gateway pods (K8s HPA)
                                       │
                              ┌────────┴────────┐
                              │  Redis Cluster  │
                              │  (rate limit    │
                              │   state + audit)│
                              └────────┬────────┘
                                       │
                              Upstream services
                              (API servers, microservices)

Luồng xử lý mỗi request trong gateway pod:

Parse X-API-Key, hash SHA-256.
HGETALL apikey:<key_hash> — lấy tenant_id, tier.
Đọc tier config từ local cache (TTL 60s). Nếu miss cache, HGETALL tenant:<tid>:config.
Chạy GCRA Lua script atomic trên Redis.
Nếu allowed: INCR quota:day:<tid>:<date>, so sánh daily quota.
Nếu allowed: XADD audit:<key_hash> MAXLEN ~ 1000.
Proxy tới upstream.

Tech stack

Gateway: FastAPI (Python 3.12) + redis.asyncio. Có thể thay bằng Go (envoy-like) hoặc Nginx Lua cho latency thấp hơn.
Redis: Redis 7.x Cluster — 3 master, 3 replica (cross-AZ). Operator: Redis Operator hoặc Helm bitnami/redis-cluster.
Deployment: Kubernetes, HPA trên CPU + RPS, GeoDNS multi-region.

5

Schema Redis

# Thông tin API key (Hash)
apikey:<key_hash>
  tenant_id   → "t_abc123"
  tier        → "paid"
  expires_at  → "1780000000"   # unix timestamp; 0 = không hết hạn

# Tier config của tenant (Hash)
tenant:<tid>:config
  rate_per_sec  → "100"
  burst         → "200"
  daily_quota   → "1000000"    # -1 = unlimited (enterprise)

# GCRA state per key (String — TAT: Theoretical Arrival Time, microseconds)
ratelimit:gcra:<key_hash>
  value: "<tat_us>"
  TTL: burst_period_us (tự tính trong Lua)

# Daily quota counter (String — integer)
quota:day:<tid>:<YYYY-MM-DD>
  value: integer (INCR)
  TTL: 48h (2 ngày, đảm bảo không reset giữa ngày timezone)

# Audit log per key (Stream)
audit:<key_hash>
  fields: ts, path, ip, method
  MAXLEN ~1000 (approximate trim)

Lưu ý key naming:

Dùng key_hash (SHA-256 hex, 64 ký tự) thay vì raw API key để tránh lộ secret trong Redis memory, log, và KEYS/SCAN.
daily_quota = -1 cho enterprise — gateway phải kiểm tra flag này để bỏ qua bước so sánh quota.
Prefix rõ ràng (apikey:, tenant:, ratelimit:gcra:, quota:day:, audit:) giúp ACL pattern match chính xác.

6

Tiered Quota Config

TIER_CONFIG = {
    "free": {
        "rate_per_sec": 10,
        "burst": 50,          # burst = số token tích lũy tối đa
        "daily_quota": 10_000,
    },
    "paid": {
        "rate_per_sec": 100,
        "burst": 200,
        "daily_quota": 1_000_000,
    },
    "enterprise": {
        "rate_per_sec": 1000,
        "burst": 5000,
        "daily_quota": -1,    # unlimited
    },
}

async def provision_tenant(redis_client, tenant_id: str, tier: str):
    """Ghi config khi tạo tenant mới hoặc đổi tier."""
    config = TIER_CONFIG[tier]
    await redis_client.hset(
        f"tenant:{tenant_id}:config",
        mapping={k: str(v) for k, v in config.items()},
    )

Tier config được đọc vào local in-process cache trong gateway pod với TTL 60 giây. Như vậy mỗi pod chỉ cần round trip Redis cho tier config mỗi phút, không phải mỗi request. Khi tenant upgrade tier, config reload được trigger qua Pub/Sub (xem mục 10).

7

GCRA Algorithm & Lua Script

GCRA (Generic Cell Rate Algorithm) là biến thể của leaky bucket, kiểm soát cả rate trung bình và burst trong một công thức duy nhất. Nó theo dõi TAT (Theoretical Arrival Time) — thời điểm "lý tưởng" request tiếp theo được phép đến. Nếu request hiện tại đến sớm hơn TAT - burst_window, nó bị từ chối.

Ưu điểm so với counter-per-window cổ điển:

Không có "reset spike" cuối window (vấn đề của fixed window counter).
Kiểm soát đồng thời rate tức thì và burst cho phép.
Chỉ cần 1 key String per API key, TTL tự quản lý.

Lua script đảm bảo atomic — không có race condition giữa đọc TAT và ghi TAT mới:

-- KEYS[1] = ratelimit:gcra:<key_hash>
-- ARGV[1] = period_us      (1_000_000 / rate_per_sec)
-- ARGV[2] = burst_period_us (period_us * burst)
-- ARGV[3] = now_us          (unix timestamp microseconds)
--
-- Trả về: {1, 0}           nếu allowed
--         {0, retry_after_us} nếu denied

local period    = tonumber(ARGV[1])
local burst     = tonumber(ARGV[2])
local now       = tonumber(ARGV[3])

-- Đọc TAT hiện tại; nếu chưa có (key mới) dùng now
local tat = tonumber(redis.call("GET", KEYS[1]) or now)

-- TAT không được ở quá khứ
if tat < now then tat = now end

-- TAT mới nếu request này được chấp nhận
local new_tat = tat + period

-- Thời điểm sớm nhất request được phép đến (xét burst)
local allow_at = new_tat - burst

if now < allow_at then
    -- Bị từ chối; trả về thời gian phải chờ (microseconds)
    return {0, allow_at - now}
end

-- Chấp nhận: ghi TAT mới, TTL = burst_period_us (đơn vị ms, làm tròn lên)
redis.call("SET", KEYS[1], new_tat, "PX", math.ceil(burst / 1000))
return {1, 0}

Một số điểm cần lưu ý khi dùng script này:

Đơn vị thời gian là microsecond — nhất quán xuyên suốt. Không trộn giây và microsecond.
PX nhận millisecond: math.ceil(burst / 1000) chuyển từ microsecond sang millisecond.
Nên load script bằng SCRIPT LOAD lúc khởi động và gọi EVALSHA để Redis Cluster cache bytecode, tránh parse lại mỗi request.
TAT lưu là chuỗi số nguyên (string representation of integer). Không serialize thành bytes; dùng tonumber() khi đọc.

8

Gateway Request Handler (FastAPI)

import time
import json
from datetime import date
from hashlib import sha256
from functools import lru_cache

from fastapi import FastAPI, Request
from fastapi.responses import JSONResponse
import redis.asyncio as aioredis
import httpx

app = FastAPI()

# Redis Cluster client — dùng redis.asyncio.RedisCluster
redis_client: aioredis.RedisCluster = None
GCRA_SHA: str = None  # SHA của script Lua sau khi SCRIPT LOAD
_tenant_cache: dict = {}  # {tenant_id: (config_dict, expire_ts)}
UPSTREAM = "http://upstream-service"

TIER_CONFIG = {
    "free":       {"rate_per_sec": 10,   "burst": 50,   "daily_quota": 10_000},
    "paid":       {"rate_per_sec": 100,  "burst": 200,  "daily_quota": 1_000_000},
    "enterprise": {"rate_per_sec": 1000, "burst": 5000, "daily_quota": -1},
}


@app.on_event("startup")
async def startup():
    global redis_client, GCRA_SHA
    redis_client = aioredis.RedisCluster.from_url(
        "redis://redis-cluster:6379",
        decode_responses=True,
        max_connections=1000,
    )
    # Load Lua script và lấy SHA để dùng EVALSHA
    GCRA_SHA = await redis_client.script_load(GCRA_LUA)


async def get_tenant_config(tenant_id: str) -> dict:
    """Đọc tier config từ local cache (TTL 60s), fallback Redis."""
    now = time.time()
    cached = _tenant_cache.get(tenant_id)
    if cached and cached[1] > now:
        return cached[0]
    # Cache miss: đọc Redis
    raw = await redis_client.hgetall(f"tenant:{tenant_id}:config")
    config = {k: int(v) for k, v in raw.items()}
    _tenant_cache[tenant_id] = (config, now + 60)
    return config


@app.middleware("http")
async def rate_limit_middleware(request: Request, call_next):
    api_key = request.headers.get("X-API-Key")
    if not api_key:
        return JSONResponse({"error": "Missing API key"}, status_code=401)

    key_hash = sha256(api_key.encode()).hexdigest()

    # 1. Authenticate — lookup tenant + tier
    data = await redis_client.hgetall(f"apikey:{key_hash}")
    if not data:
        return JSONResponse({"error": "Invalid API key"}, status_code=401)

    tenant_id = data["tenant_id"]
    config = await get_tenant_config(tenant_id)

    # 2. GCRA rate limit
    period_us = 1_000_000 // config["rate_per_sec"]          # microsec per request
    burst_us  = period_us * config["burst"]                   # burst window in microsec
    now_us    = int(time.time() * 1_000_000)

    result = await redis_client.evalsha(
        GCRA_SHA, 1,
        f"ratelimit:gcra:{key_hash}",
        period_us, burst_us, now_us,
    )
    allowed, retry_after_us = int(result[0]), int(result[1])

    if not allowed:
        resp = JSONResponse({"error": "Rate limit exceeded"}, status_code=429)
        resp.headers["Retry-After"] = str(int(retry_after_us / 1_000_000) + 1)
        resp.headers["X-RateLimit-Limit"] = str(config["rate_per_sec"])
        return resp

    # 3. Daily quota check
    daily_quota = config["daily_quota"]
    if daily_quota != -1:  # -1 = unlimited (enterprise)
        date_key = date.today().isoformat()
        count = await redis_client.incr(f"quota:day:{tenant_id}:{date_key}")
        if count == 1:
            # Đặt TTL 48h khi key vừa được tạo (xử lý timezone edge case)
            await redis_client.expire(f"quota:day:{tenant_id}:{date_key}", 172800)
        if count > daily_quota:
            return JSONResponse({"error": "Daily quota exceeded"}, status_code=429)

    # 4. Audit log
    await redis_client.xadd(
        f"audit:{key_hash}",
        {
            "ts": str(now_us),
            "path": request.url.path,
            "method": request.method,
            "ip": request.client.host if request.client else "",
        },
        maxlen=1000,
        approximate=True,
    )

    # 5. Proxy tới upstream
    return await call_next(request)

Một vài điểm thiết kế:

API key được hash SHA-256 trước khi dùng làm Redis key — không lưu raw key trong Redis memory.
Tenant config được cache in-process 60 giây, tránh round-trip Redis mỗi request.
daily_quota == -1 là sentinel cho enterprise unlimited — gateway bỏ qua bước INCR và so sánh hoàn toàn.
XADD với approximate=True dùng MAXLEN ~1000 thay vì MAXLEN 1000 — Redis sẽ trim khi thuận tiện, tiết kiệm CPU so với exact trim mỗi lần ghi.

9

Daily Quota & Audit Log

Daily quota với INCR

INCR là atomic — không có race condition khi nhiều gateway pod cùng đếm cho cùng một tenant. Pattern chuẩn:

# Không dùng GET rồi SET — phải dùng INCR để atomic
count = await redis_client.incr(f"quota:day:{tenant_id}:{date_key}")
if count == 1:
    # Key vừa được tạo lần đầu → đặt TTL
    await redis_client.expire(f"quota:day:{tenant_id}:{date_key}", 172800)
# Lưu ý: nếu count > daily_quota thì đã INCR rồi — số đếm vẫn tăng dù từ chối.
# Đây là acceptable trade-off (off-by-one cho đơn giản).

TTL 48 giờ thay vì đúng 24 giờ để xử lý edge case: request đến lúc 23:59 ngày D và expire xảy ra lúc 00:01 ngày D+1, tránh counter bị xóa sớm do timezone drift. Ngày trong key là UTC ISO date.

Audit log với Redis Stream

Redis Stream (XADD) phù hợp hơn List cho audit log vì:

Message ID tự động bao gồm timestamp (millisecond) — không cần lưu riêng.
Hỗ trợ MAXLEN ~N trim xấp xỉ, không block pipeline.
Có thể dùng XRANGE hoặc XREVRANGE để query theo time range.

# Query 10 entry gần nhất của một API key
entries = await redis_client.xrevrange(f"audit:{key_hash}", count=10)
for entry_id, fields in entries:
    print(entry_id, fields)

# entry_id dạng "1717200000000-0" — phần đầu là unix ms, dễ parse timestamp

10

Tier Upgrade & Per-endpoint Limit

Tier upgrade flow

async def upgrade_tier(api_key: str, new_tier: str):
    """Upgrade tier cho tenant: ghi Redis + trigger reload tất cả gateway pods."""
    if new_tier not in TIER_CONFIG:
        raise ValueError(f"Unknown tier: {new_tier}")

    key_hash = sha256(api_key.encode()).hexdigest()
    data = await redis_client.hgetall(f"apikey:{key_hash}")
    if not data:
        raise KeyError("API key not found")

    tenant_id = data["tenant_id"]

    # Ghi config mới vào Redis
    await redis_client.hset(
        f"tenant:{tenant_id}:config",
        mapping={k: str(v) for k, v in TIER_CONFIG[new_tier].items()},
    )
    # Cập nhật tier field trong apikey hash
    await redis_client.hset(f"apikey:{key_hash}", "tier", new_tier)

    # Publish signal để tất cả gateway pods invalidate local cache
    await redis_client.publish(
        "config:reload",
        json.dumps({"tenant": tenant_id}),
    )

Mỗi gateway pod subscribe channel config:reload và xóa entry tương ứng khỏi _tenant_cache. Lần request tiếp theo của tenant đó sẽ đọc config mới từ Redis.

Per-endpoint rate limit

Một số route cần giới hạn riêng (ví dụ POST /upload giới hạn thấp hơn GET /info). Pattern mở rộng key GCRA:

import re

# Normalize route: loại query string, chuẩn hóa path params
def normalize_route(path: str, method: str) -> str:
    # Thay thế numeric segments bằng placeholder
    normalized = re.sub(r"/\d+", "/:id", path)
    return f"{method}:{normalized}"

# Key GCRA per-endpoint
route_key = normalize_route(request.url.path, request.method)
gcra_key = f"ratelimit:gcra:{key_hash}:{route_key}"

# route_key ví dụ: "POST:/upload", "GET:/v1/items/:id"

Per-endpoint limit chạy song song (hoặc thay thế) với per-key global limit. Thiết kế thường là: global limit áp dụng trước, nếu pass thì kiểm tra per-endpoint nếu route có config riêng.

11

Rate Limit Headers Convention

Response nên bao gồm headers để client biết trạng thái rate limit (theo đề xuất IETF draft "RateLimit Headers"):

X-RateLimit-Limit: giới hạn req/s của tier hiện tại.
X-RateLimit-Remaining: ước tính token còn lại trong burst window. Với GCRA: max(0, (burst_us - (new_tat - now_us)) / period_us).
X-RateLimit-Reset: unix timestamp khi window reset (xấp xỉ cho GCRA).
Retry-After: số giây tối thiểu client nên chờ trước khi retry — bắt buộc trong response 429.

def build_ratelimit_headers(config: dict, allowed: bool,
                            now_us: int, new_tat: int,
                            period_us: int, burst_us: int,
                            retry_after_us: int) -> dict:
    headers = {
        "X-RateLimit-Limit": str(config["rate_per_sec"]),
    }
    if allowed:
        remaining = max(0, int((burst_us - (new_tat - now_us)) / period_us))
        headers["X-RateLimit-Remaining"] = str(remaining)
        headers["X-RateLimit-Reset"]     = str(int(new_tat / 1_000_000))
    else:
        headers["Retry-After"]           = str(int(retry_after_us / 1_000_000) + 1)
        headers["X-RateLimit-Remaining"] = "0"
    return headers

Retry-After được tính làm tròn lên (cộng 1) để tránh client retry ngay khi TAT chưa hoàn toàn pass.

12

Multi-region & High Throughput Tuning

Multi-region

Mỗi region (ví dụ us-east, eu-west, ap-southeast) có một Redis Cluster riêng lưu GCRA state. Không cần strong consistency cross-region cho rate limit — eventual consistency là acceptable:

Nếu client gửi request tới 2 region cùng lúc, mỗi region sẽ áp rate limit độc lập. Trong thực tế, 1 tenant hiếm khi đạt đủ throughput để cross-region bypass matter.
Pattern: primary region authoritative — nếu cần chặt chẽ hơn, tenant được pin vào 1 region qua GeoDNS/sticky routing.
Tenant config (tenant:<tid>:config) được replicate qua script provisioning khi upgrade tier, không sync realtime.

High throughput tuning

Connection pool lớn: max_connections=1000 per gateway pod. Với 10 pod, tổng 10k connections tới cluster — cần Redis maxclients đủ lớn (mặc định 10000, tăng nếu cần).
EVALSHA thay EVAL: script đã load → Redis Cluster không parse lại mỗi lần. Latency EVALSHA ≈ latency GET + một ít overhead.
Batching GCRA + INCR + XADD trong 1 Lua script: giảm từ 3 round trips xuống 1. Trade-off: script phức tạp hơn, khó debug hơn. Cân nhắc khi p99 hiện tại đã < 1ms thì không cần.
Local cache tenant config: tiết kiệm 1 round trip HGETALL per request, tiết kiệm ~30–50% Redis ops cho traffic thực tế (tenant config ít thay đổi).
Cluster slot locality: dùng hash tag khi muốn nhóm keys của cùng tenant vào 1 slot — ví dụ ratelimit:gcra:{key_hash} và quota:day:{tenant_id}:2026-06-01. Tuy nhiên với Lua script chỉ dùng 1 key thì không cần hash tag.

13

Hardening

Hash API key trước lưu: SHA-256 hex. Raw key không bao giờ xuất hiện trong Redis, log, hay monitoring. Hash là một chiều — không thể reverse.
TLS: cả kết nối client → gateway (HTTPS/TLS 1.3) và gateway → Redis (rediss:// với TLS). Dùng certificate rotation định kỳ.
ACL cho gateway user: chỉ cấp quyền trên đúng key patterns cần thiết.

# ACL rule cho gateway service account
ACL SETUSER gateway-svc on >strongpassword \
  ~apikey:* ~tenant:* ~ratelimit:gcra:* ~quota:day:* ~audit:* \
  ~config:reload \
  +GET +HGETALL +HSET +SET +INCR +EXPIRE +EVALSHA +SCRIPT \
  +XADD +XRANGE +XREVRANGE \
  +SUBSCRIBE +PUBLISH \
  -FLUSHDB -FLUSHALL -CONFIG -DEBUG -KEYS

Rotate API key: khi phát hiện leak, xóa key cũ DEL apikey:<old_hash> và thêm key mới. Không cần restart gateway.
Detect anomaly: dùng audit stream để phát hiện pattern bất thường — ví dụ 1 key gọi liên tục từ nhiều IP khác nhau trong 1 phút.
Expires_at trong apikey hash: gateway kiểm tra expires_at trước khi xử lý. Key hết hạn trả 401 ngay, không cần thêm Redis TTL vì logic expire nằm trong application.

14

Failure Mode & Circuit Breaker

Khi Redis không reachable hoặc trả lỗi, gateway phải có hành vi xác định — không crash:

Fail open (allow all)

Request vẫn được proxy tới upstream dù không kiểm tra được rate limit. Phù hợp khi:

Upstream có thể chịu tải tạm thời.
Business yêu cầu: availability quan trọng hơn rate enforcement.

Fail closed (block all)

Trả 503 cho toàn bộ request. Phù hợp khi:

Upstream cực kỳ nhạy cảm với overload.
Rate limit là hard requirement (billing, security).

Circuit breaker — hướng khuyến nghị

import asyncio
from datetime import datetime

class RedisCircuitBreaker:
    """Simple circuit breaker cho Redis operations."""

    def __init__(self, failure_threshold=5, recovery_timeout=30):
        self.failure_count = 0
        self.failure_threshold = failure_threshold
        self.recovery_timeout = recovery_timeout  # seconds
        self.last_failure_time: float = 0
        self.state = "closed"  # closed, open, half-open

    def record_failure(self):
        self.failure_count += 1
        self.last_failure_time = time.time()
        if self.failure_count >= self.failure_threshold:
            self.state = "open"

    def record_success(self):
        self.failure_count = 0
        self.state = "closed"

    def is_open(self) -> bool:
        if self.state == "open":
            if time.time() - self.last_failure_time > self.recovery_timeout:
                self.state = "half-open"
                return False
            return True
        return False

circuit_breaker = RedisCircuitBreaker(failure_threshold=5, recovery_timeout=30)

# Trong middleware:
if circuit_breaker.is_open():
    # Fail open: tiếp tục proxy, bỏ qua rate limit
    return await call_next(request)
try:
    # ... GCRA check ...
    circuit_breaker.record_success()
except Exception:
    circuit_breaker.record_failure()
    return await call_next(request)  # fail open sau threshold

Sau khi circuit mở, gateway cứ sau recovery_timeout giây sẽ thử 1 request để kiểm tra Redis đã recover chưa (half-open state). Nếu thành công → đóng circuit.

15

Monitoring

Các metric cần export (Prometheus + Grafana hoặc tương đương):

Throughput per tier: gateway_requests_total{tier="free|paid|enterprise"}.
429 rate: gateway_rate_limited_total{reason="gcra|daily_quota"} — phân biệt GCRA và daily quota để debug.
Latency overhead: histogram gateway_overhead_ms (thời gian xử lý gateway, không tính upstream). Alert khi p99 > 1ms.
Top tenants by traffic: counter per tenant_id, dùng để phát hiện tenant bất thường.
Redis operation latency: redis_command_duration_ms{cmd="evalsha|hgetall|incr|xadd"}.
Circuit breaker state: gauge gateway_circuit_breaker_open{1=open, 0=closed}.

from prometheus_client import Counter, Histogram

requests_total = Counter(
    "gateway_requests_total",
    "Total requests processed",
    ["tier", "status"],
)
rate_limited = Counter(
    "gateway_rate_limited_total",
    "Requests rejected by rate limiting",
    ["tenant_id", "reason"],
)
overhead_hist = Histogram(
    "gateway_overhead_seconds",
    "Gateway processing time (excluding upstream)",
    buckets=[0.0001, 0.0005, 0.001, 0.002, 0.005, 0.01],
)

# Trong middleware: wrap toàn bộ logic gateway trong overhead_hist.time()

16

Deployment

# gateway-deployment.yaml (sketch)
apiVersion: apps/v1
kind: Deployment
metadata:
  name: rate-limit-gateway
spec:
  replicas: 3
  selector:
    matchLabels: { app: gateway }
  template:
    spec:
      containers:
      - name: gateway
        image: rate-limit-gateway:1.0.0
        resources:
          requests: { cpu: "500m", memory: "256Mi" }
          limits:   { cpu: "2000m", memory: "512Mi" }
        env:
        - name: REDIS_URL
          valueFrom:
            secretKeyRef: { name: redis-secret, key: url }
---
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: gateway-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: rate-limit-gateway
  minReplicas: 3
  maxReplicas: 50
  metrics:
  - type: Resource
    resource:
      name: cpu
      target: { type: Utilization, averageUtilization: 60 }

Redis Cluster deploy bằng Redis Operator (RedisCluster CR) hoặc Helm bitnami/redis-cluster. 3 master + 3 replica, mỗi node ở một AZ khác nhau.
Ingress NGINX hoặc Envoy làm LoadBalancer phía trước gateway pods.
Multi-region: một Deployment + Redis Cluster per region, GeoDNS (Route53 latency routing hoặc Cloudflare) điều hướng client tới region gần nhất.
Secret cho Redis URL và API key salt (dùng Kubernetes Secret hoặc Vault).

17

Common Gotchas

Không cache EVALSHA SHA: gọi EVAL thay vì EVALSHA mỗi request — Redis phải parse và compile Lua mỗi lần, tốn ~10–50µs thêm. Dùng SCRIPT LOAD lúc startup và lưu SHA.
Đọc tenant config mỗi request: mỗi request HGETALL tenant:<tid>:config sẽ tăng gấp đôi số Redis ops. Local cache 60 giây giải quyết phần lớn tải này.
Daily quota race condition: dùng GET rồi so sánh rồi SET — không atomic, hai request đồng thời đều pass. Phải dùng INCR (atomic) rồi so sánh sau.
GCRA TAT serialization: lưu TAT là integer microsecond dạng chuỗi số. Nếu serialize thành bytes binary, tonumber() trong Lua sẽ trả nil, GCRA bị lỗi logic.
TTL của GCRA key: set TTL = burst_period_us ms khi ghi TAT. Nếu quên TTL, key tồn tại vĩnh viễn và memory phình to (1 key per API key, thường không nghiêm trọng, nhưng thành vấn đề với key ephemeral).
Multi-region sync conflict: nếu tenant gửi từ 2 region cùng lúc và vừa được upgrade tier, 1 region có thể vẫn dùng tier cũ trong ~60s (TTL local cache). Đây là acceptable với rate limit; nếu không chấp nhận, giảm local cache TTL xuống.
Lua script không idempotent với EVALSHA: nếu Redis flush script cache (SCRIPT FLUSH) hoặc sau failover, EVALSHA trả NOSCRIPT. Phải catch exception và fallback sang EVAL rồi re-cache SHA.

async def eval_gcra(key_hash: str, period_us: int,
                    burst_us: int, now_us: int) -> tuple:
    global GCRA_SHA
    try:
        result = await redis_client.evalsha(
            GCRA_SHA, 1,
            f"ratelimit:gcra:{key_hash}",
            period_us, burst_us, now_us,
        )
    except aioredis.exceptions.NoScriptError:
        # Script bị flush khỏi cache — reload và thử lại
        GCRA_SHA = await redis_client.script_load(GCRA_LUA)
        result = await redis_client.evalsha(
            GCRA_SHA, 1,
            f"ratelimit:gcra:{key_hash}",
            period_us, burst_us, now_us,
        )
    return int(result[0]), int(result[1])

18

Bài Tập & Real-world Equivalents

Bài tập thực hành

Cài đặt GCRA Lua script và viết unit test cho các trường hợp: request đầu tiên (key chưa tồn tại), burst đầy, request bị từ chối, reset sau TTL.
Viết script provisioning để tạo tenant với 3 tier (free/paid/enterprise) và kiểm tra bằng HGETALL.
Load test bằng wrk hoặc locust: đẩy 100k req/s qua gateway, đo p50/p99 latency overhead và 429 rate.
Thêm Prometheus metrics và dashboard Grafana cho throughput per tier, 429 rate, và latency histogram.
Cài đặt audit query endpoint: GET /admin/audit/{key_hash}?n=20 trả về 20 request gần nhất từ Redis Stream.
(Nâng cao) Setup 2 Redis Cluster giả lập 2 region, kiểm tra hành vi khi 1 region down, đo thời gian circuit breaker mở và đóng lại.

Real-world equivalents

Kong Gateway: plugin rate-limiting dùng Redis hoặc local (in-memory). Hỗ trợ consumer-level và per-route limit, không có GCRA sẵn nhưng có sliding window counter.
Envoy Proxy: rate limit filter gọi sang external rate limit service (thường là Lyft's ratelimit service, dùng Redis). Mô hình tách biệt hoàn toàn rate limit logic ra service riêng.
Cloudflare Workers + Redis: dùng Cloudflare Workers KV hoặc Workers Durable Objects cho rate limit — không dùng Redis trực tiếp, nhưng concept GCRA tương tự.
AWS API Gateway Usage Plans: có tier-based quota nhưng không hỗ trợ GCRA, per-key granularity hạn chế hơn so với tự build.

Bài tiếp theo

Bài 123 tổng kết toàn bộ series Redis — từ data structures, Lua scripts, Cluster, đến production hardening — và chỉ ra hướng học tiếp theo.

Danh sách bài viết