How We Reduced API Response Times by 60%

The Problem

As our user base grew, we noticed API response times creeping upward. P95 latency had reached 320ms, and some endpoints were significantly slower during peak hours. We set a target: bring P95 below 130ms without sacrificing functionality.

Identifying the Bottlenecks

We started by instrumenting our most-hit endpoints. The usual suspects emerged:

N+1 queries — Several endpoints were loading related data in loops instead of eager loading
Missing indexes — Frequently filtered columns lacked proper database indexes
Redundant queries — The same data was being fetched multiple times within a single request

The Fixes

Database Layer

We added composite indexes for our most common query patterns. For the workspace members endpoint alone, this reduced query time from 45ms to 3ms. We also identified and resolved 12 N+1 query patterns across the codebase.

Caching Strategy

We introduced a multi-layer caching approach. Frequently accessed, rarely changed data (like workspace settings and plan details) is cached with a 5-minute TTL. User-specific data uses shorter TTLs with cache invalidation on write.

Response Optimization

We reviewed our API resources and trimmed unnecessary fields from responses. Some endpoints were returning full nested objects when only an ID was needed. This reduced average response payload size by 40%.

Results

After rolling out these changes over two weeks, our numbers improved dramatically:

P50 latency: 180ms → 65ms
P95 latency: 320ms → 120ms
P99 latency: 850ms → 280ms
Database query count per request: reduced by 55% on average

Performance work is never truly done, but these foundational improvements give us a solid base to build on as we continue to scale.