We have a backup server (via barman) that's not consuming it's WAL files, and the disk filled up. I made some space on the disk, and now it's doing this:
2016-05-24 12:24:40 PDT : LOG: redo starts at 44E0/2A091CE02016-05-24 12:24:40 PDT : LOG: restored log file "00000001000044E00000002B" from archive2016-05-24 12:24:40 PDT : LOG: restored log file "00000001000044E00000002C" from archive2016-05-24 12:24:40 PDT : LOG: restored log file "00000001000044E00000002D" from archive2016-05-24 12:24:40 PDT : LOG: restored log file "00000001000044E00000002E" from archive---snip for brevity, about 30 files ---2016-05-24 12:24:45 PDT : LOG: restored log file "00000001000044E000000050" from archive2016-05-24 12:24:45 PDT : LOG: restored log file "00000001000044E000000051" from archive2016-05-24 12:24:45 PDT : LOG: restored log file "00000001000044E000000052" from archive2016-05-24 12:24:45 PDT : LOG: restored log file "00000001000044E000000053" from archive2016-05-24 12:24:45 PDT : LOG: restored log file "00000001000044E000000054" from archive2016-05-24 12:24:45 PDT : LOG: restored log file "00000001000044E000000055" from archive2016-05-24 12:24:46 PDT : LOG: restored log file "00000001000044E000000056" from archive2016-05-24 12:24:46 PDT : LOG: consistent recovery state reached at 44E0/56FFE4882016-05-24 12:24:46 PDT : LOG: database system is ready to accept read only connections2016-05-24 12:24:46 PDT : LOG: unexpected pageaddr 44D6/6A000000 in log segment 00000001000044E000000057, offset 02016-05-24 12:24:46 PDT : LOG: restored log file "00000001000044E000000056" from archive2016-05-24 12:24:51 PDT : LOG: restored log file "00000001000044E000000056" from archive2016-05-24 12:24:56 PDT : LOG: restored log file "00000001000044E000000056" from archive2016-05-24 12:25:01 PDT : LOG: restored log file "00000001000044E000000056" from archive2016-05-24 12:25:06 PDT : LOG: restored log file "00000001000044E000000056" from archive2016-05-24 12:25:11 PDT : LOG: restored log file "00000001000044E000000056" from archive2016-05-24 12:25:16 PDT : LOG: restored log file "00000001000044E000000056" from archive2016-05-24 12:25:21 PDT : LOG: restored log file "00000001000044E000000056" from archive2016-05-24 12:25:26 PDT : LOG: restored log file "00000001000044E000000056" from archive2016-05-24 12:25:31 PDT : LOG: restored log file "00000001000044E000000056" from archive
Notice how it's repeating the last file. It does this forever.
If I stop the server and restart, it repeats this exact sequence, starting with 00000001000044E00000002B.
The recovery.conf file looks like this:
standby_mode = onrestore_command = 'cp /data/pg_wal_ship_dock/%f %p 2>/dev/null'archive_cleanup_command = '/usr/local/pgsql-9.3.5/bin/pg_archivecleanup /data/pg_wal_ship_dock %r 2>>cleanup.log'
The /data/pg_wal_ship_doc directory currently has 590GB of WAL files, which is why the disk got full to begin with. The cleanup.log file is empty.
This is PG 9.3.5 running on Ubuntu.
Any suggestions where to look next?
Thanks,
Craig