Beyond N+1: The Performance Problems That Don't Show Up Until Production Scale

You've fixed the N+1 queries. You've added database indexes. You've moved email delivery to background jobs. Your app runs beautifully in development, passes CI without issue, and performs well in staging.

Then it hits production. Real traffic, real data, real concurrency. And a new category of problems emerges — the kind that doesn't show up in a test suite or on a dataset with 500 rows.

These are the performance problems I encounter most often in production Rails applications that have outgrown their initial architecture. They're not beginner mistakes. They're the traps that experienced teams fall into because the problems are invisible at small scale and catastrophic at large scale.

ActiveRecord Callbacks That Seemed Harmless

Callbacks are one of Rails' most convenient features and one of its most dangerous at scale. The problem isn't that callbacks exist — it's that they create invisible execution chains that grow in cost as your application grows in complexity.

Consider a typical healthcare scheduling platform. A patient books an appointment. The Appointment model has an after_create callback that sends a confirmation notification. Reasonable. Then the team adds an after_save callback that syncs to the calendar service. Then an after_commit callback that updates the provider's availability cache. Then another after_save that recalculates the clinic's daily capacity metrics.

Each callback made sense when it was added. But now creating a single appointment triggers a notification service call, a calendar API sync, a cache invalidation, and a metrics recalculation — all inline with the save operation. At 10 appointments per day, nobody notices. At 10,000, the accumulated latency per save is noticeable. At 100,000, you're experiencing cascading timeouts because a third-party calendar API is adding 200ms to every write operation that touches the appointments table.

The deeper issue is that callbacks hide control flow. A developer looking at a controller action that calls @appointment.save! has no idea how much work that save actually triggers unless they go read every callback on the model and every callback on every associated model that might be touched. This makes performance problems nearly impossible to diagnose from the call site.

The pattern I've seen work at scale is to treat callbacks as a code smell for anything beyond simple data integrity concerns. Setting a UUID before create? That's a callback. Sending a notification? That's a service object. Syncing to an external system? That's a background job. Recalculating aggregate metrics? That's an async event.

The discipline is: if the work doesn't need to happen for the record to be valid, it doesn't belong in a callback.

Background Job Queues That Become the Bottleneck

Moving work to background jobs solves the request-cycle latency problem. But it creates a new one: the jobs themselves need to be able to keep up with the rate at which they're enqueued.

I see this pattern often: a team starts offloading work to Sidekiq or Solid Queue and everything runs smoothly for months. Then traffic grows, more types of work get moved to background jobs, and gradually the queue depth starts climbing. Jobs that used to process in seconds now wait minutes in the queue. The system isn't failing — it's falling behind.

The most common causes are predictable. First, jobs that are too coarse-grained. A single job that processes an entire batch — "generate all invoices for the month" — will monopolize a worker for minutes or hours while smaller, time-sensitive jobs wait behind it. Break batch jobs into individual units of work. Enqueue 10,000 small jobs instead of one big one. This lets the queue system distribute work across workers and interleave time-sensitive jobs.

Second, not separating queues by priority. When appointment confirmations sit in the same queue as nightly analytics rollups, a spike in analytics work directly delays patient-facing notifications. Define separate queues — critical, default, low — and allocate workers accordingly.

Third, jobs that make external API calls without timeouts or circuit breakers. One slow third-party service can consume all your workers, leaving them blocked on HTTP responses while the rest of your queue backs up. Every external call inside a background job needs a timeout, and ideally, a circuit breaker that stops attempting calls to a degraded service.

The moment you start relying on background jobs as a core part of your application's architecture — not just for sending emails, but for processing payments, syncing data, generating documents — you need to treat your queue infrastructure with the same seriousness as your web servers. Monitor queue depth. Alert on latency. Capacity plan for peak load.

Race Conditions That Only Appear Under Concurrency

In development, requests arrive one at a time. In production, they arrive simultaneously. And code that works perfectly in serial can produce corrupted data or duplicate operations when executed concurrently.

The classic example: a patient has a wallet balance in your system. Two requests come in at the same time — one for a co-pay deduction and one for a refund credit. Both read the current balance, both compute their adjustment, both write the new balance. One of the writes is silently lost. The data is now wrong, and there's no error in the logs.

This is a textbook race condition, and it's more common in Rails applications than most teams realize. ActiveRecord's default behavior does nothing to prevent concurrent updates from stomping on each other.

The most robust solution is optimistic locking with a lock_version column, which causes the second write to fail explicitly so you can handle the retry. For operations that absolutely must be serialized — like financial transactions — pessimistic locking with SELECT FOR UPDATE ensures only one process can modify a record at a time.

But beyond row-level locking, there are subtler concurrency problems at scale. Uniqueness validations that pass in application code but fail at the database level because two requests validated simultaneously. Background jobs that process the same record because the job was enqueued twice during a retry. Cron-scheduled tasks that overlap because the previous run hasn't finished when the next one starts.

The mental shift required is this: any operation that reads a value and then writes a decision based on that value is vulnerable to concurrency issues if another process can do the same thing between the read and the write. In development, that window is theoretical. In production, it's constantly being hit.

Migrations That Lock Your Busiest Tables

The migration that ran in 0.4 seconds on staging takes an ACCESS EXCLUSIVE lock on a table with 50 million rows in production and blocks every read and write for 20 minutes. I've seen this take down production systems that had been running for years without incident.

The dangerous operations are well-known but still catch experienced teams off guard. Adding a column with a default value (fixed in PostgreSQL 11+, but still dangerous with constraints). Adding an index without the CONCURRENTLY flag. Adding a NOT NULL constraint that requires a full table scan to validate. Renaming a column, which locks the table and invalidates running application code simultaneously.

The core issue is that staging doesn't reproduce this. Your staging database has thousands of rows. Production has millions. The migration completes instantly in staging and locks production for minutes. The difference isn't the migration itself — it's the interaction between DDL locks and the volume of concurrent queries in production.

The defenses are straightforward but require discipline. Use strong_migrations or safe-pg-migrations to automatically catch dangerous operations before they reach production. Always create indexes concurrently with disable_ddl_transaction! and algorithm: :concurrently. Add columns without defaults, then backfill in batches, then add the default and constraint separately. Set lock_timeout to a few seconds so that a migration will fail fast rather than block your entire application.

The broader principle: treat migrations as deployments, not development tasks. Review them for production safety the same way you'd review a change to your load balancer configuration. Because at scale, a careless migration can do more damage than a bug in your application code.

Serialization That Scales Linearly When Your Data Doesn't

Serializing a handful of records into JSON is trivial. Serializing thousands of deeply nested records on every API request is one of the most quietly expensive operations in a Rails application.

The problem usually starts small. An endpoint returns a list of providers, each with their specialties, locations, and available time slots. At launch, there are 50 providers. The serialization takes a few milliseconds. A year later, there are 2,000 providers, each with nested associations three levels deep, and that same endpoint is spending 400ms just building the JSON response — after the database queries have already finished.

This is a scaling problem, not a tooling problem. Switching serializers helps at the margins, but the fundamental issue is that you're doing O(n) work on every request where n keeps growing.

The architectural fixes matter more than the library choice. Paginate aggressively — no endpoint should return unbounded result sets. Use sparse fieldsets so clients can request only the fields they need. Cache serialized responses at the HTTP level or the application level so you're not rebuilding the same JSON on every request. For endpoints that serve relatively static data, consider pre-computing and storing the serialized response when the underlying data changes rather than computing it on every read.

The shift in thinking is from "serialize on read" to "serialize on write." When your write-to-read ratio is 1:1000, it's dramatically more efficient to pay the serialization cost once on write and serve cached results on every read.

The Common Thread

Every problem in this article shares the same root cause: something that works at small scale behaves differently at large scale, and the difference isn't visible until production.

The way to catch these problems before they catch you is to think in terms of multipliers. Every time you add a callback, ask: what does this cost when it fires 100,000 times a day? Every time you enqueue a job, ask: what happens when the queue is 50,000 deep? Every time you write a migration, ask: what does this do to a table with 10 million rows and 500 concurrent connections?

You won't catch everything. Nobody does. But the teams that think about scale before they're forced to are the ones that sleep at night while their applications handle the traffic.