On Fri, Aug 28, 2015 at 10:11 AM, Tom Lane <tgl@xxxxxxxxxxxxx> wrote:
Steve Kehlet <steve.kehlet@xxxxxxxxx> writes:
> This is Postgres 9.4.4. I am troubleshooting some occasional (every 3-4
> hours) slowness with UPDATEs on a table that has a GIN index on a JSONB
> column. During these episodes, UPDATEs that normally take < 1sec take
> upwards of 2-4 minutes, and all finish simultaneously, like they were all
> blocked on something and finally got released.
Hm ... have you tried checking pg_locks to see if they're blocked on
something identifiable?
You might be right that this is caused by flushing the GIN pending list,
but I thought that that was not supposed to block concurrent insertions.
What I'd expect to see is *one* insert taking significantly longer than
normal, but no effect on concurrent operations.
If they all have the same work_mem setting, then they all hit the pending list limit at the same time and all of them try to do the clean up. None of them realize that other clean-ups are in progress until one process gets far enough along to actually start truncating away the pending list pages. At that point the others will soon detect the concurrent clean up and bail out, but the performance damage has already been done.
This isn't just a performance problem, it could also lead to very rare correctness bugs that are being discussed on "hackers" (http://www.postgresql.org/message-id/55D183FF.60307@xxxxxx).
Also, 2-4 minutes sounds
much longer than should be needed to flush a 10MB pending list, anyway.
I've seen it take longer than that for 4MB pending lists, once the gin index doesn't fit in RAM. 10MB is 1280 pages. If that holds 1280 tuples and each tuple has 50 keys, that would be 64,000 keys. If the keys don't have a lot of internal overlap inserting them would take 6.4 minutes with 10,000 rpm drives, assuming all the non-leaf pages are already in memory. There is no pre-fetching (not even with multiple backends trying to do the same work at the same time), so a RAID won't speed things up.
Cheers,
Jeff