PostgreSQL is excellent—until it isn't. At some point, your single database hits a wall. Here's how to know when you need sharding, and how to do it right.
Sharding is a double-edged sword: it solves single-node scaling limits but introduces cross-shard consistency, query complexity, and operational overhead.
Signs You Need Sharding
Your situation calls for sharding if:
- Single table > 100GB and hot (write-heavy)
- > 1 billion rows in a single table
- Vertical scaling exhausted: Biggest instance still can't keep up
Before You Shard, Exhaust These Alternatives
| Alternative | When It Works |
|---|---|
| Better indexing | 80% of performance issues are missing indexes |
| Read replicas | 100K QPS reads → 1 primary + 5 replicas |
| Caching | Redis in front—90% requests from cache |
| Table partitioning | Time-series data by month |
PostgreSQL Partitioning: The First Step
Three Strategies
-- HASH PARTITIONING: Distribute by hash CREATE TABLE users ( user_id BIGSERIAL, name TEXT ) PARTITION BY HASH (user_id); -- RANGE PARTITIONING: Distribute by range (time-series) CREATE TABLE events ( id BIGSERIAL, created_at TIMESTAMP NOT NULL ) PARTITION BY RANGE (created_at); -- LIST PARTITIONING: Distribute by discrete values CREATE TABLE orders ( order_id BIGSERIAL, country TEXT ) PARTITION BY LIST (country);
When to Use Each
| Strategy | Best For | Query Pattern |
|---|---|---|
| Hash | Even distribution, random access | Single-shard lookups |
| Range | Time-series, archival | Date-range queries |
| List | Categorical data | IN queries by category |
Citus: PostgreSQL Extension for Horizontal Scaling
Citus transforms a single PostgreSQL into a distributed system—transparently.
Setup
CREATE EXTENSION citus; SELECT citus_add_node('worker1.example.com', 5432); SELECT citus_add_node('worker2.example.com', 5432);
Creating Distributed Tables
CREATE TABLE events ( tenant_id BIGINT NOT NULL, event_id BIGINT NOT NULL, event_data JSONB ); SELECT create_distributed_table('events', 'tenant_id');
The Cross-Shard Query Problem
This is where sharding gets painful.
Fast: Single-Shard Query
-- Goes to ONE shard (fast, ~10ms) SELECT * FROM orders WHERE user_id = 42;
Slow: Cross-Shard Query
-- Fans out to ALL shards (slow, ~500ms) SELECT COUNT(*) FROM orders GROUP BY country;
Design queries to include the distribution key.
TL;DR
- Partitioning: Start with PostgreSQL native (range, list, hash)
- Citus: Use when single-node limits hit
- Cross-shard pain: Always design for single-shard queries
- Alternatives first: Indexes, replicas, caching
Ready to scale? Start with partitioning, move to Citus, only then consider application-level sharding.