Re: understanding max_wal_size,wal_keep_segments and checkpoints

Mariel Cherkassky <mariel.cherkassky@xxxxxxxxx> · Wed, 13 Feb 2019 12:45:58 +0200

> I'm trying to understand the logic behind all of these so I would be happy
> if you can confirm what I understood or correct me if I'm wrong :
> -The commit command writes all the data in  the wal_buffers is written into the wal files.

All the transaction log for the transaction has to be written to file, and the
files have to be sync'ed to storage before COMMIT completes.
That way the transaction can be replayed in case of a crash.

 Yeah, so basically if we open a transaction and we do some insert queries, until the transaction is commited the changes(the wal data and not the blocked that are chaned) are kept in the wal buffers ? .  When the user commits the transaction, the wal buffer(only the transaction log of that specific transaction ?) is written to wal files. When the database completes saving the content of the transaction log into the wal files, the commit completes. Did I got it right ?

> -Checkpoints writes the data itself (blocks that were changed) into the data files
>  in the base dir. Just to make sure, as part of the checkpoint, it needs to read the
>  wal files that were generated since the last checkpoint right ?

No WAL file has to be read during a checkpoint.

When data in the database ar modified, they are modified in the "shared buffers"
RAM cache.  Later, these "direty blocks" are written to disk by the background
writer process or the checkpoint.

What I meant, when checkpoint occurs, it reads the wal files created since last checkpoint, and does those changing on the data blocks on the disk ? I was not talking about dirty blocks from shared_buffer.

> -max_wal_size is a soft limit for the total size of all the wals that were generated.
>  When the total_size of the pg_xlog dir reaches max_wal_size(can increase it because
>  of peaks and some other issues..) the db will force a checkpoint to write the changes
>  from the wals into the disk and then it will start recycling old wals (all of them ?
>  or only those who were written ?).

A checkpoint is forced when more than max_wal_size WAL has been written since the
last checkpoint.

After a checkpoint, unneeded WAL segments are either recycled (renamed and reused)
or deleted (if max_wal_size has been exceeded).

WAL segments are unneeded if they are older than the checkpoint, have been archived
(if archiving is configured), don't need to be kept around because of wal_keep_segments
and are older than the position of any active replication slot.
so I'f I want have replication slot and wal_keep_segment is 0 after the archiving of the wal it should be recycled/deleted ?

> -wal_keep_segments is meant to help standbys that didn't receive the wals, so it allow
>  us to keep wal_keep_segments wals in our pg_xlog dir.

Yes.

> - in case we have a collision between wal_keep_segments and max_wal_size the
>  wal_keep_segments will be the one that be used right ?. For example, lets say my
>  wal_size is default(16MB). I set max_wal_size to 1GB which is 1600/16=100 wals.
>  However, my wal_keep_segments is set to 300. It means that when the total_size of
>  the pg_xlog directory will reach 1GB, checkpoint will be forced but old wal files
>  wont be recycled/deleted ?

Checkpoints are not forced by the size of pg_xlog, but by the amount of WAL
created since the last checkpoint. 

The last wal_keep_segments WAL segments are always kept around, even if that
exceeds max_wal_size.
So basically having wal_keep_segments and replication slot configured together is a mistake right ? In that case, if you have both configured, and you set wal_keep_segments to 0, the db should delete all the unused wals ?

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com