Zero-Downtime Migrations: Evolving a Database Without Breaking Anything
How we use Liquibase, backward-compatible schema changes, and careful deploy ordering to evolve our PostgreSQL schema without downtime.
Database migrations are the highest-risk operation in any production service. A bad migration means downtime, data loss, or a midnight rollback. Here's how we approach schema changes at hrva.cc.
The Rules of Zero-Downtime Migrations
Every migration follows three rules: additive first (add columns before code reads them), backward-compatible (old code still works with new schema), and reversible (every changeset has a known rollback path).
Liquibase Changelogs
All schema changes live in src/main/resources/db/changelog/ as numbered XML files.
The master changelog at dbchangelog.xml includes them in order. Each changeset has
a unique id and author, and migrations run automatically on application
startup. If changeset 008 hasn't been applied, it runs before the application context finishes loading.
The Liquibase changelog covers the full schema evolution — from the initial users table through URL tracking, API keys, password resets, Safe Browsing columns, IP address tracking, and the recent audit log and email log tables.
Add Columns Before Code Reads Them
When adding a new column, we deploy the migration first — before the application code that references it. During a rolling deploy, old application instances continue running against the new schema. An old instance that doesn't know about the new column simply ignores it. A new instance that expects the column finds it already there.
Removing Columns: The Reverse
Dropping a column happens in reverse: first deploy code that stops referencing it, wait for all instances to pick up the change, then deploy the migration that drops it. The gap between the two deploys is the safety window. We typically wait a full deploy cycle (all instances confirmed on the new version) before executing destructive operations.
The 🪄 Moment
The trickiest migration was adding lastSafeBrowsingCheck to a table with tens of
thousands of rows — a nullable column with no default. The ALTER TABLE was instant
on PostgreSQL (metadata-only for nullable columns with no default), but backfilling the timestamp
for existing rows took 20 minutes. We used a background batch job so the application stayed
responsive.