Database Architectures, Indexing & Sharding

Why "just pick Postgres" isn't always the answer, how B-Trees and LSM-Trees actually change your write performance, and what breaks when you shard a database without thinking about your query patterns first.

You're designing the data layer for a social app. User profiles: name, email, bio, a handful of fields that rarely change and need to stay consistent — a user should never see their own profile half-updated. Activity feed: millions of "user X liked post Y" events per hour, written constantly, read in bulk, and honestly fine if a write shows up a second late.

If you reach for the same database for both, you'll eventually regret it. This is the actual decision behind "SQL vs. NoSQL" — it's not a religious debate, it's a question of what your data looks like and how you need to read and write it.

SQL vs. NoSQL Paradigm

Dimension	SQL (e.g., PostgreSQL, MySQL)	NoSQL (e.g., MongoDB, Cassandra)
Data Model	Tabular (rows & columns) with static schemas.	Document, Key-Value, Columnar, or Graph.
Transactions	Strong ACID (Atomicity, Consistency, Isolation, Durability).	BASE (Basically Available, Soft state, Eventual consistency).
Scaling	Vertical (scale up CPU/RAM). Horizontal replication is read-heavy.	Horizontal (sharding out of the box) for high write/read volumes.
Joins	Native support for multi-table relationships.	Denormalization or manual application-level queries.

For that user profile table, Postgres is the obvious choice — you want ACID guarantees and the data is genuinely relational. For the activity feed, a columnar or key-value store built for write throughput is a better fit, because you're never going to run a complex JOIN against "who liked what" — you're going to append and scan.

Indexing Structures: B-Trees vs. LSM-Trees

Once you've picked a database, the next thing that actually determines your performance under load is the indexing structure underneath it — and this is where a lot of "why is Postgres slow on writes but Cassandra isn't" confusion comes from.

1. B-Tree (Optimized for Reads)

Relational databases like Postgres and MySQL use B-Trees (specifically B+ Trees).

Structure: Self-balancing search trees where data nodes are structured in blocks on disk.
Complexity: $O(\log N)$ lookup, insert, and delete.
Write Path: Requires random updates to disk blocks. If a block is full, it splits. This results in disk fragmentation and slower writes under heavy write load.
Best For: Read-heavy workloads, range queries, and workloads that need strict transaction locks.

2. LSM-Tree (Optimized for Writes)

NoSQL stores like Cassandra and RocksDB use Log-Structured Merge Trees.

Write Path: Writes are appended to an in-memory buffer (MemTable) and a recovery log (WAL). When the MemTable fills up, it's flushed to disk as an immutable SSTable (Sorted String Table).
Compaction: A background process merges SSTables, removing duplicate updates and tombstoned deletions.
Read Path: Checks the MemTable first, then searches SSTables (often using Bloom Filters to skip files that definitely don't contain the key).
Best For: High-throughput write workloads — logging, metrics, clickstreams, activity feeds.

This is exactly why the activity feed example above fits an LSM-Tree-backed store better: it's almost pure appends, which is the one thing LSM-Trees are built to do fast.

Scaling: Replication vs. Sharding

When a database becomes too slow or runs out of disk capacity, replication and sharding solve two different problems — and mixing them up is a common planning mistake.

1. Database Replication

Copies database state across multiple nodes. This scales reads and adds resilience — it does nothing for write capacity or storage capacity.

Primary-Replica (Master-Slave): All writes go to the primary. Reads can go to replicas. Useful when your workload is read-heavy.
Multi-Primary: Any node can accept writes. Requires conflict resolution logic (e.g., vector clocks) because two nodes can accept conflicting writes for the same row at the same time.

2. Database Sharding (Horizontal Partitioning)

Splits data by row across multiple servers based on a Shard Key. This is what actually scales writes and total storage, at the cost of a lot of new complexity.

[ App Server ]
      |
      +---> Shard Key: ID % 2 == 0 ---> [ Database Node A (IDs 0, 2, 4) ]
      |
      +---> Shard Key: ID % 2 == 1 ---> [ Database Node B (IDs 1, 3, 5) ]

,[object Object],

Sharding Algorithms

Range-Based: Grouping keys by a numerical range (e.g., users with IDs 1–10000). Simple to reason about, but creates hot spots — if your newest, most active users all land in the highest range, that one shard takes disproportionate load.
Hash-Based: Applying a hash function to the key (e.g., hash(UserID) % NumberOfShards). Even distribution, but adding a shard means the modulo changes and almost all your data needs to be rehashed and moved.
Consistent Hashing: Maps shards and keys onto a ring. Adding a node only moves the fraction of keys that land in its new range — this is the fix for the "adding one shard reshuffles everything" problem above, and it's the same technique load balancers use, covered in the next chapter.

Key Takeaways

SQL vs. NoSQL isn't a general preference — it's a question of whether your data is relational and needs strict consistency (SQL) or is high-volume and append-heavy (NoSQL).
B-Trees optimize for reads and range queries; LSM-Trees optimize for write throughput. Your write pattern should decide which one you need, not which database is trendier.
Replication scales reads and adds failover; sharding scales writes and storage. They solve different problems and most large systems eventually need both.
Sharding only helps if your queries actually use the shard key — otherwise you've just turned every cross-cutting query into a scatter-gather across every node.

Next Steps

Sharding and replication both depend on routing requests to the right place reliably — which is exactly what a load balancer does at the traffic layer. That's next.