Re: max_slot_wal_keep_size

Alvaro Herrera <alvherre@xxxxxxxxxxxxxx> · Mon, 16 Aug 2021 10:08:56 -0400

On 2021-Aug-16, Scott Ribe wrote:

> If I use max_slot_wal_keep_size to limit disk impact of a down
> replica, and subsequently a down replica causes PG to hit this limit,
> is there a particular message that will be logged when the limit is
> crossed and PG starts to purge WAL?
> 
> Context is: trying to debug a failure to bring up a replica, where the
> failure happened in the middle of a moderately complex chain of events
> that likely started with a bad disk. (Patroni is involved, FWIW)

Yes, you should see
  invalidating slot "..." because its restart_lsn ... exceeds max_slot_wal_keep_size

However, there was a bug fixed recently in that area, whereby the slot
would be invalidated but the space would not be freed; the fix was on
July 16th and it was released together with last week's minors:

Author: Alvaro Herrera <alvherre@xxxxxxxxxxxxxx>
Branch: master [ead9e51e8] 2021-07-16 12:07:30 -0400
Branch: REL_14_STABLE [e5bcbb107] 2021-07-16 12:07:30 -0400
Branch: REL_13_STABLE Release: REL_13_4 [866237a6f] 2021-07-16 12:07:30 -0400

    Advance old-segment horizon properly after slot invalidation

    When some slots are invalidated due to the max_slot_wal_keep_size limit,
    the old segment horizon should move forward to stay within the limit.
    However, in commit c6550776394e we forgot to call KeepLogSeg again to
    recompute the horizon after invalidating replication slots.  In cases
    where other slots remained, the limits would be recomputed eventually
    for other reasons, but if all slots were invalidated, the limits would
    not move at all afterwards.  Repair.

    Backpatch to 13 where the feature was introduced.

    Author: Kyotaro Horiguchi <horikyota.ntt@xxxxxxxxx>
    Reported-by: Marcin Krupowicz <mk@xxxxxxx>
    Discussion: https://postgr.es/m/17103-004130e8f27782c9@xxxxxxxxxxxxxx

-- 
Álvaro Herrera           39°49'30"S 73°17'W  —  https://www.EnterpriseDB.com/