On Fri, 5 Sep 2014 09:33:57 +0200 Alexey Klyukin <alexk@xxxxxxxxxxxx> wrote: > Greetings, > > We've got a 9.3.5 DB running in a standby mode for a fairly large DB > (500GB) with a busy WAL traffic (couple of GBs per hour) and it > occasionally 'forgets' to remove the segments it restored. > > The checkpoint_segments is set to 128, and usually we observe around > 270 segments accumulated, but at the time it happens our check > triggers at around 2K segments. The manual checkpoint command takes > ages to complete there, the fast shutdown is very slow (around 10 > minutes, usually less than 1 minute) and the WAL receiver process is > also unable to run for some reason. > > The only way to make this host delete WAL files is to restart . The > particularly notable restart point right after the shutdown shows > quite a number of removed files and buffers written (the shared > buffers is set to 8GB on this system): > > 2014-09-04 14:39:33.376 CEST,,,22354,,537a4553.5752,88217,,2014-05-19 > 19:54:27 CEST,,0,LOG,00000,"restartpoint complete: wrote 332473 > buffers (31.7%); 0 transaction log file(s) added, 1237 removed, 6 > recycled; write=9.745 s, sync=680.314 s, total=694.447 s; sync > files=499 > , longest=37.774 s, average=1.363 s",,,,,,,,,"" > > If we leave the host running, this restartpoint never happens. > > The only difference I can come up with from the other databases that > do not show this behavior is that the host is running with > max_standby_streaming_delay and max_standby_archive_delay set to -1, > but at the time we observed the problem no queries were running on it > at all. > > The problem occurs rarely, but steadily, around once every 3 months. > During this time the PostgreSQL has been upgraded from 9.0 to 9.3, > which did not solve the issue. > Perhaps, the delete of wal files occurs before, in filesystem time, the wal file is closed by filesystem, and delete returns "error file still open". > Any clues on how can we debug and diagnose the problem further to come > up with a proper bug report, if it is a bug, or are we missing > something in the configuration that causes this? > > > Regards, > -- > Alexey Klyukin > > > -- > Sent via pgsql-admin mailing list (pgsql-admin@xxxxxxxxxxxxxx) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-admin --- --- Eduardo Morras <emorrasg@xxxxxxxx> -- Sent via pgsql-admin mailing list (pgsql-admin@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-admin