Re: Rapid disk usage spikes when updating large tables with GIN indexes

Jonathan Marks <jonathanaverymarks@xxxxxxxxx> · Mon, 14 May 2018 15:31:42 -0400

We’ll turn on log_temp_files and get back to you to see if that’s the cause. Re: the exact queries — these are just normal INSERTs and UPDATEs. This occurs as part of normal database operations — i.e., we are processing 10% of a table and marking changes to a particular row, or happen to be inserting 5-10% of the table volume with new rows. Whenever we bulk load we have to drop the indexes because the disk space loss just isn’t tenable.

Re: extra disk space consumption not within PG — the AWS folks can’t tell me what the problem is because it’s all internal to the PG part of the instance they can’t access. Doesn’t mean your last suggestion can’t be the case but makes it slightly less likely.

Any chance that GIN indexes are double-logging? I.e. with fastupdate off they are still trying to keep track of the changes in the pending list or something?

Our thought has been temp files for a while, but we’re not sure what we should do if that turns out to be the case.

> On May 14, 2018, at 3:08 PM, Tom Lane <tgl@xxxxxxxxxxxxx> wrote:
> 
> [ please keep the list cc'd ]
> 
> Jonathan Marks <jonathanaverymarks@xxxxxxxxx> writes:
>> Thanks for your quick reply. Here’s a bit more information:
>> 1) to measure the “size of the database” we run something like `select datname, pg_size_pretty(pg_database_size(datname)) from pg_database;` I’m not sure if this includes WAL size.
>> 2) I’ve tried measuring WAL size with `select sum(size) from pg_ls_waldir();` — this also doesn’t budge.
>> 3) Our current checkpoint_timeout is 600s with a checkpoint_completion_target of 0.9 — what does that suggest?
> 
> Hmph.  Your WAL-size query seems on point, and that pretty much destroys
> my idea about a WAL emission spike.
> 
> pg_database_size() should include all regular and temporary tables/indexes
> in the named DB.  It doesn't include WAL (but we've eliminated that), nor
> cluster-wide tables such as pg_database (but those seem pretty unlikely
> to be at issue), nor non-relation temporary files such as sort/hash temp
> space.  At this point I think we have to focus our attention on what might
> be creating large temp files.  I do not see anything in the GIN index code
> that could do that, especially not if you have fastupdate off.  I wonder
> whether there is something about the particular bulk-insertion queries
> you're using that could result in large temp files --- which'd make the
> apparent correlation with GIN index use a mirage, but we're running out
> of other ideas.  You could try enabling log_temp_files to see if there's
> anything to that.
> 
> In the grasping-at-straws department: are you quite sure that the extra
> disk space consumption is PG's to begin with, rather than something
> outside the database entirely?
> 
> 			regards, tom lane