I think everybody agrees that incremental backup per database, and not cluster-wide, is nice, and it would be nice if PG supported it. But, given the way PG is architectured, having cluster-wide WALs, that's not an easy task to implement.Repeating "other databases have it" doesn't change that.--
Andreas Joseph Krogh
I am not advocating for the database level incremental backups,
but all databases that have it also have cluster wide/instance
wide WAL logs. Cluster wide WAL logs do not make database level
incremental backups hard. Both Oracle and DB2 have database level
incremental backups and both have cluster wide WAL (redo or logs).
So does SQL*Server which is a bit specific because its log is used
for both recovery, as WAL, and undo so its very different from
Postgres. Also, SQL Server is, to my knowledge, the only database
which doesn't use MVCC.
The technology that all of those databases employ is a bitmap
device which has one bit per each database block. Full backups set
all bits to 0 and whenever block is modified, the corresponding
bit is set to 1. The backup tool in the incremental mode then only
copies blocks with the bitmap value of 1. I am not too thrilled by
that implementation. In particular, with Oracle there were bugs
with database restore, "duplicate database" operation, global
cache locks and instance latches. I've had quite a few headaches
with RMAN cumulative incremental backups. My preferred method of
backup is storage snapshot. Snapshots then can be backed up to
other arrays (NetApp SnapVault, Hitachi HUR, EMC SRDF) or can be
backed up to deduplicated offline storage like AWS Glacier or EMC
Data Domain using simple file level utilities. Once snapshot is
taken, it is a read-only file system and the files are no longer
opened by the database processes.
The classic file level backup tools like pg_basebackup or rman are pretty much unusable once your database hits 50 TB or so. With 50TB database, your RTO (Recovery Time Objective) will be in days. You can only push around 3TB/hour down a 10Gbit Ethernet pipe. On the other hand companies like Facebook. Amazon, Walmart or Target are losing hundreds of thousands USD per hour of downtime. Downtime of 15 hours or longer is completely unacceptable. Backup is only the last line of defense. It should only be used if everything else fails. And if you have to use file level tool like pg_basebackup and tar to restore your VLDB, you should also get your resume ready.
Incremental backups are awkward and ungainly. You have to restore
the full backup and all incrementals taken since the last full.
After that, you must apply the remaining WAL files. So, by
definition, restoring incremental backup is, by definition, longer
than restoring a full backup. So, if you have to take a backup,
get yourself a deduplicating backup utility. Deduplication will
only save the backup blocks that are different from the previous
backup. In other words, you can do a daily full backup with the
space expenditure of an incremental backup. Also, if you need
backups because of regulatory reasons (HIPAA, SOX), incremental
backups will not do.
Long story short, there are much more important things to do on
Postgres than incremental database level backups. BTW, Chuck
Norris can take and incremental database level backup of a
Postgres database and recover database without restoring the
preceding full backup. I am not sure whether Chuck Norris is a
member of Postgres steering committee.
-- Mladen Gogala Database Consultant Tel: (347) 321-1217 https://dbwhisperer.wordpress.com