Re: Standby is not removing restored WAL segments

Eduardo Morras <emorrasg@xxxxxxxx> · Tue, 9 Sep 2014 10:45:24 +0200

On Fri, 5 Sep 2014 09:33:57 +0200
Alexey Klyukin <alexk@xxxxxxxxxxxx> wrote:

> Greetings,
> 
> We've got a 9.3.5 DB running in a standby mode for a fairly large DB
> (500GB) with a busy WAL traffic (couple of GBs per hour) and it
> occasionally 'forgets' to remove the segments it restored.
> 
> The checkpoint_segments is set to 128, and usually we observe around
> 270 segments accumulated, but at the time it happens our check
> triggers at around 2K segments. The manual checkpoint command takes
> ages to complete there,  the fast shutdown is very slow (around 10
> minutes, usually less than 1 minute) and the WAL receiver process is
> also unable to run for some reason.
> 
> The only way to make this host delete WAL files is to restart . The
> particularly notable restart point right after the shutdown shows
> quite a number of removed files and buffers written (the shared
> buffers is set to 8GB on this system):
> 
> 2014-09-04 14:39:33.376 CEST,,,22354,,537a4553.5752,88217,,2014-05-19
> 19:54:27 CEST,,0,LOG,00000,"restartpoint complete: wrote 332473
> buffers (31.7%); 0 transaction log file(s) added, 1237 removed, 6
> recycled; write=9.745 s, sync=680.314 s, total=694.447 s; sync
> files=499
> , longest=37.774 s, average=1.363 s",,,,,,,,,""
> 
> If we leave the host running, this restartpoint never happens.
> 
> The only difference I can come up with from the other databases that
> do not show this behavior is that the host is running with
> max_standby_streaming_delay and max_standby_archive_delay set to -1,
> but at the time we observed the problem no queries were running on it
> at all.
> 
> The problem occurs rarely, but steadily, around once every 3 months.
> During this time the PostgreSQL has been upgraded from 9.0 to 9.3,
> which did not solve the issue.
> 

Perhaps, the delete of wal files occurs before, in filesystem time, the wal file is closed by filesystem, and delete returns "error file still open".

> Any clues on how can we debug and diagnose the problem further to come
> up with a proper bug report, if it is a bug, or are we missing
> something in the configuration that causes this?
> 
> 
> Regards,
> -- 
> Alexey Klyukin
> 
> 
> -- 
> Sent via pgsql-admin mailing list (pgsql-admin@xxxxxxxxxxxxxx)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-admin

---   ---
Eduardo Morras <emorrasg@xxxxxxxx>

-- 
Sent via pgsql-admin mailing list (pgsql-admin@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin