Posts

How WhatsApp Handles Billions of Messages

WhatsApp serves over 2 billion users with fewer than 100 engineers. That ratio is absurd. A company the size of a small startup powering the world’s largest messaging platform. The architecture that makes this possible is worth understanding because the design decisions are deliberately different from what most teams would choose. The foundation of WhatsApp’s backend is Erlang and the BEAM virtual machine. Erlang was designed at Ericsson in the 1980s for telephone switches. These systems had specific requirements. They must never go down. They must handle millions of simultaneous connections. They must update without restarting. And they must process messages with microsecond latency. ...

Consensus Algorithms

Three generals are surrounding a city. They must attack at dawn or retreat. If they all attack, they win. If they all retreat, they live to fight another day. If some attack and some retreat, they are destroyed. They can only communicate by messenger. One general might be a traitor. How do they reach agreement? This is the Byzantine Generals Problem. And it cuts to the heart of distributed systems. ...

API Gateway

You have ten microservices. Each has its own URL, its own authentication logic, its own rate limiting. Your frontend team needs to know all ten URLs. Your mobile app needs to call ten different endpoints. When a service changes its address, every client breaks. Now imagine your system grows to fifty services. Then a hundred. The frontend is now managing a web of direct connections. Authentication is duplicated across every service. Rate limiting is inconsistent. Logging is scattered. Monitoring is a nightmare. ...

Database Replication

Your database holds all your data on one machine. That machine dies. Your entire application goes down. Every user sees errors. Revenue stops. And there is nothing you can do until that machine comes back online. This is the availability problem. The fix is replication. Keep multiple copies of the same data on different machines. If one fails, another takes over. But replication is not just about Copies. It raises real design questions. Who accepts writes? How do copies stay in sync? What happens when they fall behind? How do you handle conflicts when two copies disagree? ...

How YouTube Serves Billions of Videos

Over 500 hours of video are uploaded to YouTube every minute. Over a billion hours of video are watched every day. The scale is hard to comprehend. Let’s break down how a system this large actually works. When you upload a video to YouTube, nothing about the experience suggests what’s happening behind the scenes. The upload finishes, you see a progress bar, and eventually the video is live. But in that gap, the system does an enormous amount of work. ...

Database Sharding

Your database has one server. It holds all your data. It works fine until it doesn’t. The CPU maxes out. Disk I/O crawls. Queries that took 5ms now take 500ms. You add more RAM. You upgrade the CPU. You get a bigger machine. But eventually, one machine cannot keep up. This is the vertical scaling ceiling. You make the box bigger until you cannot make it any bigger. The alternative is horizontal scaling. Instead of one giant database, you spread the data across multiple smaller databases. Each one holds a subset of the data. Each one handles a subset of the traffic. Together, they behave like one logical database. ...

Caching Strategies

Your database can handle 5,000 queries per second. Your users are sending 50,000. Most of those queries are asking for the same data over and over. The product page for that trending item. The user profile that hasn’t changed in weeks. The configuration that’s identical for every request. Do you really want to hit the database every time? Of course not. You cache. Caching is the act of storing a copy of data in a faster storage layer so that future requests can be served without going back to the source. The source could be a database, an external API, or a file system. The cache is something faster. Usually memory. Redis. Memcached. Even the browser. ...

Load Balancing

Imagine a restaurant with one waiter. Ten tables arrive at once. The waiter panics, orders get mixed up, customers leave angry. Now hire three waiters and assign tables evenly. Everything flows. That’s load balancing. Your application server can handle maybe 10,000 requests per second. Your users are sending 100,000. One server will choke. The answer is not a bigger server. The answer is more servers, and something smart enough to distribute traffic between them. ...

CAP Theorem

CAP theorem is one of those ideas everyone knows, but very few actually design with. And that’s because it’s usually taught as a formula, not as a pressure situation. So let’s do this the system designer way. Assume you’re building a distributed system. Not a single server, not a single database. Multiple nodes, multiple machines, talking over a network. The moment you distribute a system, one thing becomes inevitable: things will fail. Machines crash. Networks slow down. Packets get lost. You don’t get to opt out of this. ...

Event Driven Architecture

Let’s talk about Event Driven Architecture. Imagine you are building an e-commerce platform. A customer places an order. What all should happen? Payment should be processed. Inventory should be updated. Email confirmation should be sent. Analytics should record the purchase. Maybe fraud detection should run. Now tell me honestly, should Order Service directly call all these services one by one? What happens if Email Service is down? Should order placement fail? What if Analytics is slow? Should customer wait? This is where tight coupling starts hurting. ...