Re: pg_xlog no longer rotating out

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Mark,

On Tue, Jul 02, 2019 at 04:30:25PM -0400, Mark Steben wrote:
> Good afternoon,
> We run postgres 9.4 with hot standby streaming replication.  We have been
> running abnormally high updates and the pg_xlogs have queued to about 80
> percent of disk capacity.  This often happens and the network is not fast
> enough to handle the transfer of logs to the replicated server.
> What I have done in the past, and done again today is to change the
> archive_command from the script that scp's the logs to '/bin/true'.  This
> stops the logship and allows the normal pg_xlog rotation to catch up.

Postgres cares only about the exit code of your "archive_command", so even
when you're just trying to get rid of WAL segments (without actually moving
them into an archive), it's best to set it to something that gives some kind
of indication that something's actually happening with the WAL. I'd recommend
replacing your `scp` invocation with a roughly equivalent command line,
prefixed with `echo`, so you can check whether or not your archive_command is
executed at all, and which WAL segment it's targeting.

What is your *normal* archive_command setting, with `scp` involved? Do you
have any "special" ssh_config flags in effect for its target host?


> This has not happened this time.  Logs have been stored on pg_xlog since
> June 30 and the number of logs keep climbing.  I am now manually moving
> some of the logs that were processed on June 30 off to another disk to
> alleviate the space. PLEASE HELP if you can.

Manually touching pg_xlog (or pg_wal on newer releases) is not a good idea, as
I'm sure you're aware. If I were in your position, my next steps would be:

1.) Create a number (think maybe five or six) files that are a few hundred
MBytes in size on the filesystem hosting your database. The idea is that, IF
your filesystems fills up, you have a number of shots to restart the database
without breaking a sweat after removing one of these "insurance files".

2.) Figure out what's the root cause for your WAL not being
archived/recognized as successfully processed, and what WAL you postmaster is
currently trying to deal with, as detailed above, and fix it - or return to
the list with your findings if you can't see what's wrong.


> It looks like archiving is still occurring despite '/bin/true' being set.
> Can I safely kill -term the archive thread?

How exactly do you figure that there's still actual archiving going on? (Are
you monitoring the remote host's "auth" log or anything like that, which would
probably trigger upon scp-incurred logins?)

Which process (full output from `ps -fp <that-particular-PID>`) are you
talking about killing in particular?



With all that said, you shouldn't really be using use `scp` as an
archive_command in the first place. Maybe take a look at
https://johannes.truschnigg.info/code/pg_archive_wal_segment-2.0.0/ (and its
README file) for properties to consider when choosing an archive_command
that's fit for the job. (In my view, it's always better to perform archiving
to a *host-local* filesystem, and have its consumers pick it up from there,
instead of shipping directly onto a consumer's filesystem over the network.)

-- 
with best regards:
- Johannes Truschnigg ( johannes@xxxxxxxxxxxxxxx )

www:   https://johannes.truschnigg.info/
phone: +43 650 2 133337
xmpp:  johannes@xxxxxxxxxxxxxxx

Please do not bother me with HTML-email or attachments. Thank you.

Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux