Re: Rapid disk usage spikes when updating large tables with GIN indexes

Tom Lane <tgl@xxxxxxxxxxxxx> · Mon, 14 May 2018 13:16:26 -0400

Jonathan Marks <jonathanaverymarks@xxxxxxxxx> writes:
> One recurring, and predictable, issue that we have experienced regularly for multiple years is that inserting or updating rows in any table with GIN indexes results in extremely large drops in free disk space — i.e. inserting 10k rows with a total size of 10GB can result in the temporary loss of several hundred gigabytes of free disk space over 2-3 hours (and it could be more — we try to keep a 10-15% buffer of free disk space so that often represents almost all available disk space). Once we stop the operation, free disk space rapidly recovers, which makes us believe that this occurs due to logs, or some kind of temporary table. Our work_mem and maintenance_work_mem settings are pretty large (12GB and 62GB, respectively). The database’s size on disk scarcely budges during this process.

I'm not following exactly what you mean by "the database’s size on
disk scarcely budges" --- how does that square with the free disk space
dropping?  (IOW, what are you measuring?)

If you're not including WAL space in the "database size", then perhaps
a plausible theory is that the space consumption comes from a burst of
WAL output, and that the space is freed after the WAL has been dumped
off to secondary servers or archived or whatever you do with it.
If you do none of those things, it'd reduce to being an issue of how
long till the next checkpoint.

Assuming this theory is accurate, probably your use of fastupdate = off
is a contributing factor, as that causes a lot more "churn" in the
internals of the GIN indexes during updates, and correspondingly more
WAL output to log the changes.  But you probably don't want to revisit
that decision if you're happy with performance otherwise.

If you are archiving or streaming WAL, then probably what you want to
look at is getting rid of bottlenecks in that, so that it can keep up
with these WAL-generation spikes better.

If you're not, the only suggestion I can think of is to try twiddling
your checkpoint parameters to alleviate the space spikes.  Reducing
the checkpoint interval would do that, but you have to be very wary
of going too far; a short checkpoint interval results in more full-page
images being emitted to WAL and thus can actually increase your WAL
space consumption.  Depending on what parameters you're using now,
maybe even an increase would be better.

			regards, tom lane