On Fri, Aug 28, 2015 at 10:00 AM, Steve Kehlet <steve.kehlet@xxxxxxxxx> wrote:
This is Postgres 9.4.4. I am troubleshooting some occasional (every 3-4 hours) slowness with UPDATEs on a table that has a GIN index on a JSONB column. During these episodes, UPDATEs that normally take < 1sec take upwards of 2-4 minutes, and all finish simultaneously, like they were all blocked on something and finally got released.My team and I have done the usual stuff looking at the system itself and its underlying infrastructure, and we don't think that's it. It does seem to be localized to just this table.We're reading about GIN fast updates and noted that:As of PostgreSQL 8.4, GIN is capable of postponing much of this work by inserting new tuples into a temporary, unsorted list of pending entries. When the table is vacuumed, or if the pending list becomes too large (larger than work_mem), the entries are moved to the main GIN data structure using the same bulk insert techniques used during initial index creation. This greatly improves GIN index update speed, even counting the additional vacuum overhead. Moreover the overhead work can be done by a background process instead of in foreground query processing.
The main disadvantage of this approach is that searches must scan the list of pending entries in addition to searching the regular index, and so a large list of pending entries will slow searches significantly. Another disadvantage is that, while most updates are fast, an update that causes the pending list to become "too large" will incur an immediate cleanup cycle and thus be much slower than other updates.I am wondering if the UPDATEs are hitting some blockage on the table due to this cleanup cycle. Is this a possibility, and if so how can I investigate this? What queries can I run or stats can I check to see if this pending list is approaching work_mem, or we're hitting this cleanup cycle? Is there any logging I can enable to see when these cleanups are occurring?
That is hard to do in 9.4. PostgreSQL 9.5 will add the gin_metapage_info function to the pageinspect module which makes it easier. If you don't mind doing some hacking, you could probably get the new pageinspect to run on 9.4.
Or what I usually do in a case like this is clone the database to a test/QA server then run pg_upgrade to get that running on 9.5, then hope what I learn transfers back to production.
But the symptoms you describe are exactly what I expect from these clean up problems, so I would just assume that that is the problem.
The easiest solution is to turn of fastupdate for that index. Each update will then be individually slower, but you won't have the periodic lock up you currently do. If that doesn't work for you, then you can change the autoanalyze settings for the table so that it get auto-analyzed very frequently.
One test we are doing right now is running VACUUMs continuously on the table to see if this slowness stops happening. We should know within a few hours. If this seems to help, we'll try turning off FASTUPDATE on the index.
Vacuum is overkill (and can be extremely slow to run a large gin index), you just need to get it to autoanalyze by changing the per-table setting of "autovacuum_vacuum_scale_factor" to zero and instead using "autovacuum_analyze_threshold" to control the autovacuum process for that table. Note that a manual ANALYZE will *not* clear the pending list, it has to be an autoanalyze. (Manual VACUUM will clear the pending list, but you might have trouble getting manual VACUUM to complete fast enough)
Any other ideas or suggestions would be appreciated, thanks!
The bottleneck is generally a lot of random IO. Can you put the gin index on SSD?
Cheers,
Jeff