Quota Context (Deferred Throttling)¶
This is one of Traffik's useful features, and also the one most people don't need on day one. But when you do need it, you'll be very glad it exists.
The Problem¶
Standard throttling is optimistic: consume quota first, then do the work. Most of the time that's fine. But imagine this scenario:
- Client sends a request to generate a report (costs 10 quota)
- Traffik immediately deducts 10 from the quota counter
- The report generation fails halfway through
- The quota is gone, but the client got nothing
Or the reverse:
- You want to check three different throttles before doing expensive work
- If any of them would reject the request, you don't want the others consumed either
- With standard throttling you'd have to hit each throttle, track which ones fired, and manually undo... which you can't
Quota Context solves this. It queues throttle hits and only actually consumes them when you say so — or not at all if something goes wrong.
This is an advanced feature
Quota Context adds real complexity. For most use cases, standard Depends(throttle) is simpler and should be preferred. Use Quota Context when you have specific conditional consumption requirements.
QuotaContext is NOT atomic
QuotaContext consumption does not guarantee "all or nothing" like an ACID transaction. It prevents optimistic over-consumption (quota is not deducted until apply() is called), but if a context partially succeeds before failing during apply() — for example, if entry 3 of 5 fails — the quota already consumed by entries 1 and 2 is not rolled back. The cost of wasted quota falls on the API, not the client. Design your quota contexts accordingly: keep them small, and use apply_on_error=False (the default) to avoid charging clients for server-side failures.
Bound Mode¶
The most common usage: create a context tied to one throttle.
from fastapi import FastAPI, Request, Depends
from traffik import HTTPThrottle
from traffik.backends.redis import RedisBackend
backend = RedisBackend("redis://localhost:6379", namespace="myapp")
app = FastAPI(lifespan=backend.lifespan)
throttle = HTTPThrottle("api:reports", rate="50/hour", backend=backend)
@app.post("/reports/generate")
async def generate_report(request: Request):
# Queue throttle hits — they are NOT consumed yet
async with throttle.quota(request) as quota:
await quota(cost=10) # Queued: cost=10
await quota(cost=5) # Aggregated! Still one entry: cost=15
# Do the expensive work
report = await generate_expensive_report()
# Context exits successfully -> quota consumed (15 units deducted)
return {"report": report}
If generate_expensive_report() raises an exception, the context exits with an error — and no quota is consumed. The client doesn't pay for a failed operation.
Unbound Mode¶
When you need to control multiple throttles together, use QuotaContext directly:
from traffik.quotas import QuotaContext
burst_throttle = HTTPThrottle("api:burst", rate="20/min", backend=backend)
daily_throttle = HTTPThrottle("api:daily", rate="500/day", backend=backend)
@app.post("/exports")
async def create_export(request: Request):
async with QuotaContext(request) as quota:
# Must specify throttle explicitly in unbound mode
await quota(burst_throttle, cost=5)
await quota(daily_throttle, cost=5)
result = await run_export()
return {"export": result}
Both throttles are consumed atomically on successful exit, or neither is consumed on failure.
Cost Aggregation¶
Consecutive calls with the same throttle and identical configuration are automatically merged into a single backend operation. This is a performance optimization — fewer round-trips to the backend.
async with throttle.quota(request) as quota:
await quota(cost=2) # Entry 1: cost=2
await quota(cost=3) # Aggregated into Entry 1: cost=5
await quota() # Aggregated into Entry 1: cost=6
await quota(other_throttle) # Entry 2: different throttle, new entry
await quota(cost=1) # Entry 1 again? No — other_throttle broke the streak
# This becomes Entry 3
Aggregation breaks when: different throttle, different context, different retry config, or a cost function is used (can't know the cost until apply time).
Conditional Consumption¶
apply_on_error¶
Control whether quota is consumed when an exception occurs:
# Default: don't consume on any exception
async with throttle.quota(request, apply_on_error=False) as quota:
await quota(cost=5)
result = await risky_operation() # Exception -> no quota consumed
# Always consume, even on exceptions (use sparingly)
async with throttle.quota(request, apply_on_error=True) as quota:
await quota(cost=5)
await risky_operation() # Exception -> quota still consumed
# Consume only for specific exception types
from fastapi import HTTPException
async with throttle.quota(request, apply_on_error=(ValueError,)) as quota:
await quota(cost=5)
await risky_operation()
# ValueError -> consume quota
# Any other exception -> don't consume quota
apply_on_exit=False¶
Disable automatic consumption on exit and manage it yourself:
async with throttle.quota(request, apply_on_exit=False) as quota:
await quota(cost=5)
is_valid = await validate_business_rules()
if not is_valid:
await quota.cancel() # Discard — no quota consumed
return {"error": "validation_failed"}
result = await process()
await quota.apply() # Manually consume quota after success
return {"result": result}
Manual apply() and cancel()¶
async with throttle.quota(request, apply_on_exit=False) as quota:
await quota(cost=10)
try:
result = await complex_operation()
await quota.apply() # Success: consume
except ExpectedError:
await quota.cancel() # Known failure: don't consume
raise
apply() is idempotent — calling it twice is safe. cancel() is final — you cannot un-cancel a context.
Pre-checking with check()¶
Sometimes you want to check if quota would be available before committing to an operation:
# Check at the throttle level
if not await throttle.check(request, cost=10):
raise HTTPException(429, "Insufficient quota for this operation")
# Check inside a quota context
async with throttle.quota(request) as quota:
await quota(cost=10)
if not await quota.check(): # Check using the owner throttle
await quota.cancel()
raise HTTPException(429, "Rate limit would be exceeded")
result = await expensive_operation()
TOCTOU caveat
check() is a best-effort snapshot. Between the check and the actual consumption, another request from the same client could use up the remaining quota. For strong guarantees, use lock=True (see below) or design your system to handle ConnectionThrottled exceptions gracefully.
Nested Contexts¶
Child contexts merge their queued entries into the parent when they exit successfully:
async with throttle.quota(request) as parent:
await parent(cost=2)
async with parent.nested() as child:
await child(cost=1)
# child exits successfully -> merges cost=1 into parent's queue
# Parent's queue now has cost=3 total
await parent(cost=1) # Aggregated: still same entry if config matches
# All consumed here: 4 total units
The parent acquires the lock (if configured). Child contexts under a parent don't acquire their own lock by default — they operate under the parent's lock context.
Locking¶
Enable locking to make the entire quota context atomic with respect to other contexts using the same key:
# Use throttle UID as lock key (simplest)
async with throttle.quota(request, lock=True) as quota:
await quota(cost=5)
result = await process() # Lock held for entire duration — keep this fast!
# Custom lock key
async with throttle.quota(request, lock="user:123:api_calls") as quota:
await quota(cost=5)
# Custom lock config
async with throttle.quota(
request,
lock=True,
lock_config={"ttl": 30, "blocking_timeout": 5}
) as quota:
await quota(cost=5)
Keep locked contexts fast
The lock is held for the entire duration of the async with block. Long-running operations (database queries, external API calls) inside a locked context will block other requests waiting for the same lock, increasing latency significantly.
Retry with Backoff¶
Individual quota entries can be configured to retry on failure:
from traffik.backoff import ExponentialBackoff
async with throttle.quota(request) as quota:
await quota(
cost=5,
retry=3, # Up to 3 retries
backoff=ExponentialBackoff(multiplier=2.0, max_delay=10.0),
base_delay=0.5, # Start with 0.5s delay
)
await quota(
cost=1,
retry=2,
retry_on=(TimeoutError,), # Only retry on timeouts
)
Backoff Strategies¶
| Strategy | Behavior | Best For |
|---|---|---|
ConstantBackoff |
Same delay every retry | Simple retry logic, predictable timing |
LinearBackoff(increment=1.0) |
Delay grows by increment each attempt |
Gradual back-pressure |
ExponentialBackoff(multiplier=2.0) |
Delay doubles each attempt, with optional jitter | Production retries, thundering-herd prevention |
LogarithmicBackoff(base=2.0) |
Delay grows logarithmically | Many retries with diminishing returns |
The default backoff when retry > 0 is ExponentialBackoff(multiplier=2.0, max_delay=60.0, jitter=True).
Inspecting the Context¶
You can inspect a quota context's state at any time:
async with throttle.quota(request) as quota:
await quota(cost=5)
await quota(cost=3)
print(quota.queued_cost) # 8 (estimated, excludes cost functions)
print(quota.applied_cost) # 0 (nothing consumed yet)
print(quota.active) # True (not consumed or cancelled)
print(quota.consumed) # False
print(quota.cancelled) # False
print(quota.is_bound) # True (created via throttle.quota())
print(quota.is_nested) # False
print(quota.depth) # 0
After the context exits:
Limitations¶
Be aware of these before reaching for QuotaContext:
| Limitation | Detail |
|---|---|
| No rollback on partial failure | If entry 3 of 5 fails during apply(), entries 1 and 2 are already consumed |
TOCTOU with check() |
Quota can change between check() and apply() — use locks for strong consistency |
cancelled is final |
Once cancelled, a context cannot be un-cancelled or re-used |
apply() is idempotent |
Calling it multiple times only consumes once — safe but not a retry mechanism |
| Nested lock ordering | Acquiring locks in different orders in nested contexts can deadlock |
| Cost functions resolved at apply | If your throttle uses a cost function, queued_cost is an estimate |