Re: Massive parallel queue table causes index deterioration, but REINDEX fails with deadlocks.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Some ideas:

You could ALTER TABLE SET (fillfactor=50) to try to maximize use of HOT indices
during UPDATEs (check pg_stat_user_indexes).

You could also ALTER TABLE SET autovacuum parameters for more aggressive vacuuming.

You could recreate indices using the CONCURRENTLY trick
(CREATE INDEX CONCURRENTLY new; DROP old; ALTER .. RENAME;)

On Sat, Feb 23, 2019 at 04:05:51PM -0500, Gunther wrote:
> I am using an SQL queue for distributing work to massively parallel workers.
> Workers come in servers with 12 parallel threads. One of those worker sets
> handles 7 transactions per second. If I add a second one, for 24 parallel
> workers, it scales to 14 /s. Even a third, for 36 parallel workers, I can
> add to reach 21 /s. If I try a fourth set, 48 workers, I end up in trouble.
> But that isn't even so much my problem rather than the fact that in short
> time, the performance will deteriorate, and it looks like that is because
> the queue index deteriorates and needs a REINDEX.
> 
> The queue table is essentially this:
> 
> CREATE TABLE Queue (
>   jobId bigint,
>   action text,
>   pending boolean,
>   result text
> );
> 
> the dequeue operation is essentially this:
> 
> BEGIN
> 
> SELECT jobId, action
>   FROM Queue
>   WHERE pending
>   FOR UPDATE SKIP LOCKED
> 
> which is a wonderful concept with the SKIP LOCKED.
> 
> Then I perform the action and finally:
> 
> UPDATE Queue
>    SET pending = false,
>        result = ?
>  WHERE jobId = ?
> 
> COMMIT
> 
> I thought to keep my index tight, I would define it like this:
> 
> CREATE UNIQUE INDEX Queue_idx_pending ON Queue(jobId) WHERE pending;
> 
> so that only pending jobs are in that index.
> 
> When a job is done, follow up work is often inserted into the Queue as
> pending, thus adding to that index.
> 
> Below is the performance chart.
> 
> The blue line at the bottom is the db server.
> 
> 
> You can see the orange line is the first worker server with 12 threads. It
> settled into a steady state of 7/s ran with 90% CPU for some 30 min, and
> then the database CPU% started climbing and I tried to rebuild the indexes
> on the queue, got stuck there, exclusive lock, no jobs were processing, but
> the exclusive lock was never obtained for too long. So I shut down the
> worker server. Database quiet I could resolve the messed up indexes and
> restarted again. Soon I added a second worker server (green line) starting
> around 19:15. Once settled in they were pulling 14/s together. but you can
> see in just 15 min, the db server CPU % climbed again to over 40% and the
> performance of the workers dropped, their load falling to 30%. Now at around
> 19:30 I stopped them all, REINDEXed the queue table and then started 3
> workers servers simultaneously. They settled in to 21/s but in just 10 min
> again the deterioration happened. Again I stopped them all, REINDEXed, and
> now started 4 worker servers (48 threads). This time 5 min was not enough to
> see them ever settling into a decent 28/s transaction rate, but I guess they
> might have reached that for a minute or two, only for the index
> deteriorating again. I did another stop now started only 2 servers and
> again, soon the index deteriorated again.
> 
> Clearly that index is deteriorating quickly, in about 10,000 transactions.
> 
> BTW: when I said 7/s, it is in reality about 4 times as many transactions,
> because of the follow up jobs that also get added on this queue. So 10,0000
> transactions may be 30 or 40 k transactions before the index deteriorates.
> 
> Do you have any suggestion how I can avoid that index deterioration problem
> smoothly?
> 
> I figured I might just pause all workers briefly to schedule the REINDEX
> Queue command, but the problem with this is that while the transaction
> volume is large, some jobs may take minutes to process, and in that case we
> need to wait minutes to quiet the database with then 47 workers sitting as
> idle capacity waiting for the 48th to finish so that the index can be
> rebuilt!
> 
> Of course I tried to resolve the issue with vacuumdb --analyze (just in case
> if the autovacuum doesn't act in time) and that doesn't help. Vacuumdb
> --full --analyze would probably help but can't work because it required an
> exclusive table lock.
> 
> I tried to just create a new index of the same
> 
> CREATE UNIQUE INDEX Queue_idx2_pending ON Queue(jobId) WHERE pending;
> DROP INDEX Queue_idx_pending;
> ANALYZE Queue;
> 
> but with that I got completely stuck with two indexes where I could not
> remove either of them for those locking issues. And REINDEX will give me a
> deadlock error rightout.
> 
> I am looking for a way to manage that index so that it does not deteriorate.
> 
> May be if I was not defining it with
> 
> ... WHERE pending;
> 
> then it would only grow, but never shrink. May be that helps somehow? I
> doubt it though. Adding to an index also causes deterioration, and most of
> the rows would be irrelevant because they would be past work. It would be
> nicer if there was another smooth way.




[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux