Hi Mark, On Tue, Jul 02, 2019 at 04:30:25PM -0400, Mark Steben wrote: > Good afternoon, > We run postgres 9.4 with hot standby streaming replication. We have been > running abnormally high updates and the pg_xlogs have queued to about 80 > percent of disk capacity. This often happens and the network is not fast > enough to handle the transfer of logs to the replicated server. > What I have done in the past, and done again today is to change the > archive_command from the script that scp's the logs to '/bin/true'. This > stops the logship and allows the normal pg_xlog rotation to catch up. Postgres cares only about the exit code of your "archive_command", so even when you're just trying to get rid of WAL segments (without actually moving them into an archive), it's best to set it to something that gives some kind of indication that something's actually happening with the WAL. I'd recommend replacing your `scp` invocation with a roughly equivalent command line, prefixed with `echo`, so you can check whether or not your archive_command is executed at all, and which WAL segment it's targeting. What is your *normal* archive_command setting, with `scp` involved? Do you have any "special" ssh_config flags in effect for its target host? > This has not happened this time. Logs have been stored on pg_xlog since > June 30 and the number of logs keep climbing. I am now manually moving > some of the logs that were processed on June 30 off to another disk to > alleviate the space. PLEASE HELP if you can. Manually touching pg_xlog (or pg_wal on newer releases) is not a good idea, as I'm sure you're aware. If I were in your position, my next steps would be: 1.) Create a number (think maybe five or six) files that are a few hundred MBytes in size on the filesystem hosting your database. The idea is that, IF your filesystems fills up, you have a number of shots to restart the database without breaking a sweat after removing one of these "insurance files". 2.) Figure out what's the root cause for your WAL not being archived/recognized as successfully processed, and what WAL you postmaster is currently trying to deal with, as detailed above, and fix it - or return to the list with your findings if you can't see what's wrong. > It looks like archiving is still occurring despite '/bin/true' being set. > Can I safely kill -term the archive thread? How exactly do you figure that there's still actual archiving going on? (Are you monitoring the remote host's "auth" log or anything like that, which would probably trigger upon scp-incurred logins?) Which process (full output from `ps -fp <that-particular-PID>`) are you talking about killing in particular? With all that said, you shouldn't really be using use `scp` as an archive_command in the first place. Maybe take a look at https://johannes.truschnigg.info/code/pg_archive_wal_segment-2.0.0/ (and its README file) for properties to consider when choosing an archive_command that's fit for the job. (In my view, it's always better to perform archiving to a *host-local* filesystem, and have its consumers pick it up from there, instead of shipping directly onto a consumer's filesystem over the network.) -- with best regards: - Johannes Truschnigg ( johannes@xxxxxxxxxxxxxxx ) www: https://johannes.truschnigg.info/ phone: +43 650 2 133337 xmpp: johannes@xxxxxxxxxxxxxxx Please do not bother me with HTML-email or attachments. Thank you.
Attachment:
signature.asc
Description: PGP signature