Re: PostgreSQL 10.5 : Strange pg_wal fill-up, solved with the shutdown checkpoint

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Achilleas Mantzios wrote:
> On Sunday morning I started getting email warnings about running out of space on our pg_wal dir (which is a symlink to a separate disk).
> 
> As the size was about to reach 100% of the disk (100GB) I shutdown the database in order to investigate without the pressure of a potentially worse situation. Only to see that the restart solved the 
> problem. I had not kept track of the total number of files in pg_wal during the previous days, but normally they are about 550->600 files (see relevant settings below), so since the info on the log 
> regarding checkpoints refer to the changes done by those checkpoints I am not sure about the number of files before the 85% threshold at which our monitoring system starts to send out alerts. But 
> doing the calculations this 85% is reached when the number of files exceeds 5000.
> The removal of wal files (according to checkpoint log entries) stopped some time in late Friday.
> 
>   [782] 5bbc6127.30e 2018-11-02 22:59:20.015 EET  @ line:5560 LOG: checkpoint complete: wrote 7758 buffers (0.7%); 0 WAL file(s) added, 5 removed, 21 recycled; write=777.496 s, sync=0.006 s, 
> total=777.554 s; sync files=152, longest=0.002 s, average=0.000 s; distance=108549 kB, estimate=412619 kB
>   [782] 5bbc6127.30e 2018-11-02 23:13:14.493 EET  @ line:5562 LOG: checkpoint complete: wrote 7104 buffers (0.7%); 0 WAL file(s) added, 13 removed, 18 recycled; write=712.311 s, sync=0.007 s, 
> total=712.379 s; sync files=190, longest=0.001 s, average=0.000 s; distance=473908 kB, estimate=473908 kB
>   [782] 5bbc6127.30e 2018-11-02 23:26:10.157 EET  @ line:5564 LOG: checkpoint complete: wrote 7623 buffers (0.7%); 0 WAL file(s) added, 0 removed, 20 recycled; write=638.089 s, sync=0.006 s, 
> total=638.152 s; sync files=245, longest=0.006 s, average=0.000 s; distance=716742 kB, estimate=716742 kB
>   [782] 5bbc6127.30e 2018-11-02 23:41:24.253 EET  @ line:5566 LOG: checkpoint complete: wrote 7277 buffers (0.7%); 0 WAL file(s) added, 0 removed, 0 recycled; write=652.984 s, sync=0.002 s, 
> total=653.014 s; sync files=195, longest=0.001 s, average=0.000 s; distance=326408 kB, estimate=677709 kB
>   [782] 5bbc6127.30e 2018-11-02 23:47:25.473 EET  @ line:5568 LOG: checkpoint complete: wrote 1138 buffers (0.1%); 0 WAL file(s) added, 0 removed, 0 recycled; write=114.098 s, sync=0.002 s, 
> total=114.110 s; sync files=189, longest=0.002 s, average=0.000 s; distance=84617 kB, estimate=618400 kB
> 
> No deletions happened in Saturday, neither in Sunday till the shutdown. Then shutdown gave :
>   [782] 5bbc6127.30e 2018-11-04 10:49:55.571 EET  @ line:5916 LOG: checkpoint starting: shutdown immediate
> ....
>   [782] 5bbc6127.30e 2018-11-04 10:50:04.838 EET  @ line:5917 LOG: checkpoint complete: wrote 2206 buffers (0.2%); 0 WAL file(s) added, 5128 removed, 105 recycled; write=0.038 s, sync=0.023 s, 
> total=9.268 s; sync files=129, longest=0.014 s, average=0.000 s; distance=261973 kB, estimate=620287 kB
> 
> The removal of those 5128 files freed 80GB of space. (and the used space is still low as we speak after about 180GB of traffic since the incident).
> 
> Our current settings are :
> 
> wal_keep_segments = 512
> max_wal_size = 2GB
> min_wal_size = 1GB

If you set log_min_messages to DEBUG2, you should get a log message like

   attempting to remove WAL segments older than log file ...

during each checkpoint.  That would help to determine what is going on.

The traditional things to hold back WAL segment removal are wal_keep_segments and
replication slots, but that should also hold for a shutdown checkpoint.

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com





[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux