API Rate Limit
Also known as: Rate Limiting, API Throttling, Request Throttling
API rate limits cap how many requests your systems can send to a service in a given window, throttling traffic to protect performance and prevent abuse.
Definition
An API rate limit is a ceiling on the number of API calls a client can make to a service over a defined time window — typically expressed as requests per second, minute, hour, or day. When you exceed the cap, the service returns a 429 (Too Many Requests) error and refuses to process the call until the window resets.
In practice, every integration your stack depends on — payment processors, email senders, CRM data syncs, AI providers, shipping APIs — enforces some flavor of rate limit. Operators feel these limits when bulk imports stall, automations skip records, or a sync job mysteriously drops data during peak hours.
Rate limits differ from quotas (which usually meter total monthly volume tied to a billing plan) and from concurrency limits (which cap simultaneous in-flight requests regardless of total throughput). All three can fire on the same endpoint, and diagnosing which one tripped is half the battle.
Why It Matters
Rate limits directly shape what your team can ship and how fast. A migration that should take an hour can stretch across a weekend if you don't account for the destination system's per-minute cap, and a customer-facing feature can degrade silently when a third-party API throttles your background workers during a traffic spike.
Teams that ignore rate limits ship integrations that work fine in testing and fail in production. Common fallout: duplicate records when retries aren't idempotent, partial syncs that leave your CRM and billing system out of step, automation queues that back up for hours, and support tickets you can't reproduce because the error only appears at scale.
Examples in Practice
A 40-person ecommerce brand runs a nightly job to push order data into their fulfillment platform. The fulfillment API caps writes at 120/minute, so the engineering team adds a token-bucket throttle and exponential backoff — the job now takes 22 minutes instead of failing halfway through on Black Friday.
A B2B SaaS support team uses an AI agent to draft replies on inbound tickets. The AI provider enforces both a requests-per-minute and a tokens-per-minute limit, so the ops lead configures the agent to queue overflow tickets and process them within the next window rather than dropping responses.
A mid-market agency syncs contacts between their CRM and their email platform every 15 minutes. After a marketing import adds 50,000 records, the sync hits the email platform's hourly write cap and pauses — the integration auto-resumes once the window resets, but reporting goes stale for two hours.