Database Architectures, Indexing & Sharding
Explore the internal indexing structures of relational databases, distributed scaling, and NoSQL storage layout trade-offs.
Choosing and scaling a database is one of the most critical decisions in system design. We must choose between relational (SQL) databases with strong ACID guarantees, and non-relational (NoSQL) databases offering high horizontal scalability.
SQL vs. NoSQL Paradigm
| Dimension | SQL (e.g., PostgreSQL, MySQL) | NoSQL (e.g., MongoDB, Cassandra) | |---|---|---| | Data Model | Tabular (rows & columns) with static schemas. | Document, Key-Value, Columnar, or Graph. | | Transactions | Strong ACID (Atomicity, Consistency, Isolation, Durability). | BASE (Basically Available, Soft state, Eventual consistency). | | Scaling | Vertical (scale up CPU/RAM). Horizontal replication is read-heavy. | Horizontal (sharding out of the box) for high write/read volumes. | | Joins | Native support for multi-table relationships. | Denormalization or manual application-level queries. |
Indexing Structures: B-Trees vs. LSM-Trees
To retrieve data quickly, databases construct indexes. The underlying data structure changes the performance profile dramatically:
1. B-Tree (Optimized for Reads)
Relational databases like Postgres and MySQL use B-Trees (specifically B+ Trees).
- Structure: Self-balancing search trees where data nodes are structured in blocks on disk.
- Complexity: $O(\log N)$ lookup, insert, and delete.
- Write Path: Requires random updates to disk blocks. If a block is full, it splits. This results in disk fragmentation and slower writes.
- Best For: Read-heavy workloads, range queries, and strict transaction locks.
2. LSM-Tree (Optimized for Writes)
NoSQL stores like Cassandra and RocksDB use Log-Structured Merge Trees.
- Write Path: Writes are appended to an in-memory buffer (MemTable) and a recovery log (WAL). When full, the MemTable is flushed to disk as an immutable SSTable (Sorted String Table).
- Compaction: A background process merges SSTables, removing duplicate updates and tombstoned deletions.
- Read Path: Checks the MemTable first, then searches SSTables (often using Bloom Filters to avoid empty searches).
- Best For: High-throughput write workloads (e.g., logging, metrics, clickstreams).
Scaling: Replication vs. Sharding
When a database becomes too slow or runs out of disk capacity, we must scale.
1. Database Replication
Copies database state across multiple nodes.
- Primary-Replica (Master-Slave): All writes go to the Primary. Reads can go to the Replicas. Useful for read scaling.
- Multi-Primary: Any node can accept writes. Requires complex conflict resolution logic (e.g., vector clocks).
2. Database Sharding (Horizontal Partitioning)
Splits data by row across multiple servers based on a Shard Key.
[ App Server ]
|
+---> Shard Key: ID % 2 == 0 ---> [ Database Node A (IDs 0, 2, 4) ]
|
+---> Shard Key: ID % 2 == 1 ---> [ Database Node B (IDs 1, 3, 5) ]
,[object Object],
Sharding Algorithms
- Range-Based: Grouping keys by a numerical range (e.g., users with IDs 1-10000). Simple, but creates hot-spots.
- Hash-Based: Applying a hash function to the key (e.g.,
hash(UserID) % NumberOfShards). Even distribution, but adding shards requires rehashing all data. - Consistent Hashing: Maps shards and keys to a ring. When adding nodes, only a fraction of keys need to be moved.