Re: finding out what's generating WALs

Achilleas Mantzios <achill@xxxxxxxxxxxxxxxxxxxxx> · Tue, 11 Dec 2018 16:48:28 +0200

On 11/12/18 4:00 μ.μ., Chris Withers wrote:
Hi All,

With a 9.4 cluster, what's the best way to find out what's generating the most WAL?

I'm looking after a multi-tenant PG 9.4 cluster, and we've started getting alerts for the number of WALs on the server.
It'd be great to understand what's generating all that WAL and what's likely to be causing any problems.\

One way is to keep snapshots of pg_stat_user_tables and then try to identify spikes based on the various _tup fields.
Another way is to take a look in your archive (where you keep your archived wals), try to identify a period where excessive wals were generated and then use 
https://www.postgresql.org/docs/11/pgwaldump.html to see what's in there.

More generally, what's number of WALs is "too much"? check_postgres.pl when used in nagios format only appears to be able to alert on absolute thresholds, does this always make sense? What's a good 
threshold to alert on?

Regarding you wals in pg_wal,  a good threshold could be anything more than a e.g. 10% increase from wal_keep_segments with a trend to go up. If this number goes up chances are something bad is happening.

cheers,

Chris

--
Achilleas Mantzios
IT DEV Lead
IT DEPT
Dynacom Tankers Mgmt