Re: Backup server

Jerry Sievers <gsievers19@xxxxxxxxxxx> · Tue, 24 May 2016 15:32:29 -0500

Craig James <cjames@xxxxxxxxxxxxxx> writes:

> We have a backup server (via barman) that's not consuming it's WAL files, and the disk filled up. I made some space on the disk, and now it's doing this:
>
>     2016-05-24 12:24:40 PDT Â : LOG: Â redo starts at 44E0/2A091CE0
>     2016-05-24 12:24:40 PDT Â : LOG: Â restored log file "00000001000044E00000002B" from archive
>     2016-05-24 12:24:40 PDT Â : LOG: Â restored log file "00000001000044E00000002C" from archive
>     2016-05-24 12:24:40 PDT Â : LOG: Â restored log file "00000001000044E00000002D" from archive
>     2016-05-24 12:24:40 PDT Â : LOG: Â restored log file "00000001000044E00000002E" from archive
>     ---snip for brevity, about 30 files ---
>     2016-05-24 12:24:45 PDT Â : LOG: Â restored log file "00000001000044E000000050" from archive
>     2016-05-24 12:24:45 PDT Â : LOG: Â restored log file "00000001000044E000000051" from archive
>     2016-05-24 12:24:45 PDT Â : LOG: Â restored log file "00000001000044E000000052" from archive
>     2016-05-24 12:24:45 PDT Â : LOG: Â restored log file "00000001000044E000000053" from archive
>     2016-05-24 12:24:45 PDT Â : LOG: Â restored log file "00000001000044E000000054" from archive
>     2016-05-24 12:24:45 PDT Â : LOG: Â restored log file "00000001000044E000000055" from archive
>     2016-05-24 12:24:46 PDT Â : LOG: Â restored log file "00000001000044E000000056" from archive
>     2016-05-24 12:24:46 PDT Â : LOG: Â consistent recovery state reached at 44E0/56FFE488
>     2016-05-24 12:24:46 PDT Â : LOG: Â database system is ready to accept read only connections
>     2016-05-24 12:24:46 PDT Â : LOG: Â unexpected pageaddr 44D6/6A000000 in log segment 00000001000044E000000057, offset 0
>     2016-05-24 12:24:46 PDT Â : LOG: Â restored log file "00000001000044E000000056" from archive
>     2016-05-24 12:24:51 PDT Â : LOG: Â restored log file "00000001000044E000000056" from archive
>     2016-05-24 12:24:56 PDT Â : LOG: Â restored log file "00000001000044E000000056" from archive
>     2016-05-24 12:25:01 PDT Â : LOG: Â restored log file "00000001000044E000000056" from archive
>     2016-05-24 12:25:06 PDT Â : LOG: Â restored log file "00000001000044E000000056" from archive
>     2016-05-24 12:25:11 PDT Â : LOG: Â restored log file "00000001000044E000000056" from archive
>     2016-05-24 12:25:16 PDT Â : LOG: Â restored log file "00000001000044E000000056" from archive
>     2016-05-24 12:25:21 PDT Â : LOG: Â restored log file "00000001000044E000000056" from archive
>     2016-05-24 12:25:26 PDT Â : LOG: Â restored log file "00000001000044E000000056" from archive
>     2016-05-24 12:25:31 PDT Â : LOG: Â restored log file "00000001000044E000000056" from archive
>
> Notice how it's repeating the last file. It does this forever.
>
> If I stop the server and restart, it repeats this exact sequence, starting with 00000001000044E00000002B.
>
> The recovery.conf file looks like this:
>
>     standby_mode = on
>     restore_command = 'cp /data/pg_wal_ship_dock/%f %p 2>/dev/null'
>     archive_cleanup_command = '/usr/local/pgsql-9.3.5/bin/pg_archivecleanup /data/pg_wal_ship_dock %r 2>>cleanup.log'
>
> The /data/pg_wal_ship_doc directory currently has 590GB of WAL files, which is why the disk got full to begin with. The cleanup.log file is empty.
>
> This is PG 9.3.5 running on Ubuntu.
>
> Any suggestions where to look next?

Yes.

Don't devnull your cp command.  Let it spill into the log file and/or
redirect it to  some actual file and see if there's anything being said
by cp.

And you should minor version upgrade your 9.3 to latest.  You're way
behind.

> Thanks,
> Craig
>

-- 
Jerry Sievers
Postgres DBA/Development Consulting
e: postgres.consulting@xxxxxxxxxxx
p: 312.241.7800

-- 
Sent via pgsql-admin mailing list (pgsql-admin@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin