Relational Databases (RDBMS)
Relational databases organize data in tables with rows and columns, linked by foreign keys. They've been the default for decades because they offer something powerful: ACID guarantees.
ACID Properties
| Property | What it means | Why it matters |
|---|---|---|
| Atomicity | A transaction either fully completes or fully rolls back | No partial writes — money doesn't vanish mid-transfer |
| Consistency | Every transaction moves the DB from one valid state to another | Constraints (unique keys, foreign keys) are always enforced |
| Isolation | Concurrent transactions don't interfere with each other | Two users updating the same row don't corrupt each other's writes |
| Durability | Once committed, data survives crashes | Power failure doesn't lose your committed transactions |
When to use RDBMS: Structured data with clear relationships, complex queries involving JOINs, transactions that require strong consistency. Think e-commerce orders, user accounts, financial records.
NoSQL Databases
NoSQL isn't a single technology — it's a category of databases that trade some relational features for specific strengths. The four main types solve different problems:
Simple get/set by key. Blazing fast, limited query capability.
Use for: Caching, session storage, feature flags
Examples: Redis, Memcached, DynamoDB
Store semi-structured documents (JSON/BSON). Flexible schema, nested data.
Use for: Content management, user profiles, catalogs
Examples: MongoDB, CouchDB, Firestore
Columns grouped into families. Optimized for queries over large datasets.
Use for: Time series, IoT data, analytics
Examples: Cassandra, HBase, Bigtable
Data as nodes and edges. Optimized for traversing relationships.
Use for: Social graphs, recommendation engines, fraud detection
Examples: Neo4j, Amazon Neptune, Dgraph
BASE Properties
While RDBMS targets ACID, many NoSQL systems follow BASE:
- Basically Available: The system guarantees some level of availability, even during failures.
- Soft state: The system's state may change over time, even without new input (due to background replication).
- Eventually consistent: Given enough time without new writes, all replicas converge.
Scaling Patterns
Replication
Scales reads. Leader is the write bottleneck.
Scales writes. Needs conflict resolution.
Federation (Functional Partitioning)
Federation splits your monolithic database into multiple databases by function. Instead of one giant database handling users, products, orders, and analytics, you give each domain its own database.
Upside: Smaller databases, less replication lag, independent scaling. Downside: Cross-domain JOINs require application-level logic, and you need more infrastructure.
Sharding (Horizontal Partitioning)
Sharding distributes rows of a single table across multiple databases based on a shard key. If you shard users by user_id % 4, users 1, 5, 9 go to shard 0; users 2, 6, 10 go to shard 1; and so on.
Denormalization
Normalization removes data duplication. Denormalization adds it back intentionally. Why? Because JOINs are expensive at scale.
Instead of joining users and orders tables on every page load, you store the user's name directly in the orders table. The data is duplicated, but reads are faster because they hit a single table.
Trade-off: Faster reads, slower and more complex writes (you need to update the duplicated data everywhere). Use denormalization when read performance is critical and your data doesn't change frequently.
SQL Tuning Essentials
- Indexes: An index on a column is like a book's index — it lets the DB find rows without scanning the entire table. But indexes slow down writes and consume storage. Index columns you query by, not everything.
- Query analysis: Use
EXPLAIN ANALYZEto see how your database executes a query. Look for sequential scans on large tables — that's your DB reading every single row. - Connection pooling: Opening a DB connection is expensive. Use a pool (like PgBouncer) that maintains pre-opened connections and lends them to your application.
- Avoid N+1 queries: If you fetch 100 users and then make 100 queries to get each user's orders, that's 101 queries. Use JOINs or batch fetching instead.