Re: WAL still kept in pg_xlog even long after heavy workload is done

MichaelDBA <MichaelDBA@xxxxxxxxxxx> · Fri, 27 Oct 2017 08:50:01 -0400

To get the values right, 
you have to consider the "unit" column in pg_settings.  On mine, it is 
16M for both min and max wal size.  So it would be

1024 x 1024 x 16 x <value> (pg_settings.min_wal_size or 
pg_settings.max_wal_size)

The result of this formula should be close to what you specified in 
postgresql.conf.

So modify your SQL a bit:

psql -c "select name, setting, unit
 from pg_settings where name like '%wal_size';"

Regards,

Michael Vitale

   	Stefan Petrea
        Friday,
 October 27, 2017 7:28 AM

  Hello,

We're 
encountering some problems with WAL growth in production with
PostgreSQL
 9.6.3 and 9.6.2. From what I know a WAL file can either be
recycled(and
 would be reused) or deleted.
We'd like to have better control over 
the amount of WAL that is kept around.
There were a few occasions 
where we had to resize partitions because
pg_xlog grew as much as it 
did.

According to the docs [1] there are some parameters in GUC 
(postgresql.conf) about this.
The parameters I've been able to 
identify are the following:

* wal_keep_segments
* max_wal_size
*
 min_wal_size 

Our WAL grows a lot around the time of our product
 upgrades (that is,
when we upgrade to a new version of our database,
 so not a Pg upgrade,
just a newer version of our db schema, plpgsql 
code etc).
As part of this upgrade, we add new columns or have some 
large UPDATEs
on tables as big as 300M (but in one case we also have 
one with 1.5B rows).

I am seeing the following int he docs [3]

    min_wal_size (integer)
    As long as WAL disk usage stays below 
this setting, old WAL files are 
    always recycled for future use 
at a checkpoint, rather than removed.
    This can be used to ensure 
that enough WAL space is reserved to handle
    spikes in WAL usage, 
for example when running large batch jobs. The default
    is 80 MB. 
This parameter can only be set in the postgresql.conf file or
    on 
the server command line.

This sounds very familiar because, 
that's essentially what we're doing. There
are some large jobs that 
cause a lot of workload and changes and generate a lot of WAL.

So
 far, the way I interpret this is min_wal_size is the amount of WAL
recycled
 (that is kept around to be reused) and max_wal_size is the
total 
amount of WAL allowed to be kept on disk.

I would also like to 
interpret the default values of min_wal_size and max_wal_size.
So if I
 run the following query:

    psql -c "select name, setting from 
pg_settings where name like '%wal_size';"

I get the following:

    max_wal_size|2097152
    min_wal_size|1048576

Do these two
 values look ok?

Both these values were generated by pgtune [4], 
but it seems like pgtune
thinks they're expressed by default in KB.
Looking
 at the PostgreSQL code, it seems to me that these two are
expressed 
in MB, at least that's what I understand when I see
GUC_UNIT_MB in 
the source code [6].

So maybe the pgtune fork we're using has a 
bug in the sense that it
produces an incorrect value for those two 
parameters? (should be in MB
but is expressed in KB, therefore much 
higher than what it should be).

Another question is, how can I 
use any of the checkpoint settings
to control the WAL that is kept 
around?

* checkpoint_timeout 
* checkpoint_completion_target 
*
 checkpoint_flush_after 
* checkpoint_warning 

=========

I
 actually tried something with these settings on a test environment.
I've
 used the following settings:

    checkpoint_timeout = 40s

 min_wal_size = 600MB
    max_wal_size = 900MB

Then I've 
created a db named x1 and ran this on it four or five times.

pgbench -i -s 70 x1

The pg_xlog directory grew to 2.2G and after a
 few minutes, it decreased to 2.0G
After about 40 minutes it 
decreased to 1.4G and it's not going any lower.
I was expecting 
pg_xlog's size to be 600MB after the first WAL removal had run.
Should
 I expect that the size will eventually drop to 600MB or will it just 
sit there at 1.4G?

=========

Other thoughts:

I have
 looked a bit at Pg internals too, I'm seeing four functions
there 
that are responsible for removing WAL: XLogArchiveIsReady,
RemoveXlogFile,
 RemoveOldXlogFiles, XLOGfileslop.
All of these belong to 
/src/backend/access/transam/xlog.c

The only place in the code 
that seems to take a decision about how much
WAL to recycle and how 
much to remove is the function XLOGfileslop [2].

It seems like 
XLOGfileslop is an estimate for the number of WAL to keep
around(recycled
 WAL). Both max_wal_size and min_wal_size are used inside
XLOGfileslop.

As
 far as checkpoint_* GUC settings go, they seem to be involved as well.
So
 far, the only thing I know about checkpoints is that between
checkpoints,
 many WAL are created.  The amount of WAL between checkpoints
can 
vary. I don't have a good understanding about the interplay between
checkpoints
 and WAL.

I'd be grateful for any thoughts on how to improve 
this, and better control
the amount of WAL kept in pg_xlog. 

Thank
 you,
Stefan

[1] 
https://www.postgresql.org/docs/9.6/static/wal-configuration.html
[2]

https://github.com/postgres/postgres/blob/0c5803b450e0cc29b3527df3f352e6f18a038cc6/src/backend/access/transam/xlog.c#L2258
[3]

https://www.postgresql.org/docs/9.6/static/runtime-config-wal.html#RUNTIME-CONFIG-WAL-CHECKPOINTS
[4]
 https://github.com/kmatt/pgtune
[5] 
https://github.com/kmatt/pgtune/blob/master/pgtune#L560
[6] 
https://github.com/postgres/postgres/blob/f49842d1ee31b976c681322f76025d7732e860f3/src/backend/utils/misc/guc.c#L2268

Stefan
 Petrea
System Engineer

stefan.petrea@xxxxxxxxxx