Posts

Observability in Distributed Systems

A user reports that the checkout page is slow. You open your monitoring dashboard. CPU is fine. Memory is fine. Request latency shows a bump. But which service caused it? The request passed through the API gateway, the auth service, the cart service, the pricing service, the inventory service, and the payment service. One of them is slow. Which one? In a monolith, you have one log. You search it. You find the slow function. Problem solved in minutes. ...

Data Modeling for Scale

Most developers learn data modeling from textbooks. Normalize everything to third normal form. Eliminate redundancy. One fact in one place. Every column depends on the key, the whole key, and nothing but the key. Then they build a real system. Queries take 200ms because joining five tables for every page load is expensive at scale. The application spends more time assembling data from normalized tables than doing anything useful. The database CPU is pegged at 90% on joins alone. ...

How Netflix Streams Without Downtime

Netflix serves over 250 million subscribers across 190 countries. It processes over a petabyte of data per day. It accounts for a significant percentage of global internet traffic. And its engineers deploy code thousands of times per day across hundreds of microservices with essentially zero downtime. This is not an accident. It is the result of deliberate architectural decisions designed around one principle above all others. Availability matters more than anything else. A user trying to watch a movie who sees an error will cancel their subscription. A worse movie recommendation is a minor annoyance. The system is built to stay up even when pieces of it fail. ...

Scaling Strategies

Your application serves 100 users. One server handles it fine. Then it serves 1,000. Still fine. Then 10,000. The server’s CPU hits 90%. Response times creep up. Database connections start timing out. You need to do something. That something is scaling. But scaling is not one thing. It’s a set of decisions, each with tradeoffs. The first decision is the simplest. Do you make the existing machine bigger, or do you add more machines? ...

CDN and Edge Computing

A user in Tokyo requests a web page hosted in Virginia. The request travels across the Pacific, through multiple routers, to the origin server. The server processes it. The response travels back. Total round trip? 300 milliseconds on a good day, 500 or more on a bad one. Every image, every script, every stylesheet makes the same journey. Now put a server in Tokyo that holds a copy of that static content. The user’s request travels to a local data center. The response comes back in 20 milliseconds. That’s the difference between a page that loads instantly and a page that feels broken. ...

Distributed Locking

Two servers try to withdraw money from the same bank account at the same time. Server A reads the balance as 1000. Server B also reads 1000. Server A subtracts 200 and writes 800. Server B subtracts 100 and writes 900. The final balance is 900. The account lost 200. This is a classic race condition. On a single machine, you fix this with a mutex. A lock. Only one thread can hold the lock at a time. The other waits. Problem solved. ...

How Uber Matches Riders and Drivers in Real Time

You open the Uber app and request a ride. Within seconds, the app tells you about nearby drivers. You confirm. A driver accepts. A car appears on the map moving toward you. The whole process takes less than 30 seconds. Behind that 30 seconds is a system that must track the location of millions of drivers, match riders to drivers based on proximity and preference, recalculate routes in real time, adjust pricing dynamically, and handle payments across dozens of countries. All while dealing with network latency, GPS inaccuracy, and the inherent unpredictability of human behavior. ...

Rate Limiting

Your API handles 100 requests per second comfortably. A user writes a script that sends 10,000 requests per second. Your servers crawl. Legitimate users get timeouts. Your database connection pool exhausts. The entire system degrades because of one bad actor. This is why rate limiting exists. It’s not just about preventing abuse. It’s about protecting the system from itself. Every resource is finite. Rate limiting ensures no single consumer consumes more than their fair share. ...

Message Queues

A user places an order. Your application needs to process payment, update inventory, send a confirmation email, update analytics, and trigger a fraud check. If the application calls each service directly and synchronously, what happens when the email service is slow? The user waits. What happens when analytics goes down? The whole chain breaks. What happens when traffic spikes on Black Friday? Every service in the chain must handle peak load simultaneously. ...

Consistent Hashing

You have 5 cache servers. You hash each user ID modulo 5 to decide which server holds their data. User 42 goes to server 2. User 87 goes to server 2 as well. Everything works. Then traffic grows. You add a 6th server. Now you hash modulo 6. User 42 hashes to server 0. User 87 hashes to server 3. Almost every user’s data is now on the wrong server. Your cache hit rate drops to near zero. Every request that would have been a cache hit becomes a cache miss. Your database drowns. ...