When is archive_cleanup called?

François Beausoleil <francois@xxxxxxxxxxxx> · Fri, 30 Nov 2012 09:44:15 -0500

Hello list!

I'm using 9.1.5 on Ubuntu 11.10, in a streaming replication scenario. On my slave, recovery.conf states:

standby_mode = on
restore_command = '/usr/local/omnipitr/bin/omnipitr-restore -D /var/lib/postgresql/9.1/main/ --source gzip=/data/dbanalytics-wal/ --remove-unneeded --temp-dir /var/tmp/omnipitr -l /var/log/omnipitr/restore-^Y-^m-^d.log --streaming-replication --verbose --error-pgcontroldata hang "%f" "%p"'
archive_cleanup_command = '/usr/local/omnipitr/bin/omnipitr-cleanup --verbose --log /var/log/omnipitr/cleanup-^Y-^m-^d.log --archive gzip=/data/dbanalytics-wal/ "%r"'
primary_conninfo = 'host=master port=5432 user=dbrepl password=password'

I ran out of disk space on the slave, because the archived WAL records were not removed. The documentation for archive_cleanup_command states[1]:

"This optional parameter specifies a shell command that will be executed at every restartpoint."

The slave's last reference to a restart point was:

2012-11-28 20:18:08.279 UTC - @ 20910 (00000) 2012-11-27 21:10:01 UTC - LOG:  restartpoint complete: wrote 58947 buffers (22.5%); 0 transaction log file(s) added, 0 removed, 103 recycled; write=539.537 s, sync=0.088 s, total=540.221 s; sync files=122, longest=0.087 s, ave
rage=0.000 s
2012-11-28 20:18:08.279 UTC - @ 20910 (00000) 2012-11-27 21:10:01 UTC - LOG:  recovery restart point at E8F/F71DA1E0
2012-11-28 20:18:08.279 UTC - @ 20910 (00000) 2012-11-27 21:10:01 UTC - DETAIL:  last completed transaction was at log time 2012-11-28 17:15:51.275427+00
2012-11-28 21:19:24.245 UTC - svanalytics@svanalytics_staging 3476 (00000) 2012-11-28 19:55:50 UTC - LOG:  duration: 4984689.535 ms  statement: SELECT ...

OmniPITR's restore log ends with:

2012-11-28 13:54:13.396142 +0000 : 26378 : omnipitr-restore : LOG : Timer [Copying segment 0000000200000E8E00000069 to /var/lib/postgresql/9.1/main/pg_xlog/RECOVERYXLOG] took: 0.238s
2012-11-28 13:54:13.485916 +0000 : 26378 : omnipitr-restore : LOG : Segment 0000000200000E8E00000069 restored
2012-11-28 13:54:13.787225 +0000 : 26384 : omnipitr-restore : LOG : Called with parameters: -D /var/lib/postgresql/9.1/main/ --source gzip=/data/dbanalytics-wal/ --remove-unneeded --temp-dir /var/tmp/omnipitr -l /var/log/omnipitr/restore-^Y-^m-^d.log --streaming-replication --verbose --error-pgcontroldata hang 0000000200000E8E0000006A pg_xlog/RECOVERYXLOG
2012-11-28 13:54:13.802772 +0000 : 26384 : omnipitr-restore : FATAL : Requested file does not exist, and it is streaming replication environment. Dying.

And OmniPITR's cleanup log ends with:

2012-11-28 20:18:11.237740 +0000 : 10384 : omnipitr-cleanup : LOG : Segment 0000000200000E8F000000F2.gz removed.
2012-11-28 20:18:11.256186 +0000 : 10384 : omnipitr-cleanup : LOG : Segment 0000000200000E8F000000F3.gz removed.
2012-11-28 20:18:11.258942 +0000 : 10384 : omnipitr-cleanup : LOG : Segment 0000000200000E8F000000F4.gz removed.
2012-11-28 20:18:11.261542 +0000 : 10384 : omnipitr-cleanup : LOG : Segment 0000000200000E8F000000F5.gz removed.
2012-11-28 20:18:11.263758 +0000 : 10384 : omnipitr-cleanup : LOG : Segment 0000000200000E8F000000F6.gz removed.
2012-11-28 20:18:11.265259 +0000 : 10384 : omnipitr-cleanup : LOG : 34 segments removed.

It's 2012-12-30 14:35 UTC on the machine.

How come no new restart points were achieved? I had 4008 WAL archives on my slave. I expected them to be removed as streaming replication progressed. Are restart points prevented while long queries are running?

Thanks!
François Beausoleil

  [1]: http://www.postgresql.org/docs/current/static/archive-recovery-settings.html#ARCHIVE-CLEANUP-COMMAND

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general