Re: Massive parallel queue table causes index deterioration, but REINDEX fails with deadlocks.

MichaelDBA <MichaelDBA@xxxxxxxxxxx> · Mon, 25 Feb 2019 16:07:30 -0500

Was wondering when that 
would come up, taking queuing logic outside the database.  Can be overly
 painful architecting queuing logic in relational databases. imho.

Regards,

Michael Vitale

   	Jeff Janes
        Monday,
 February 25, 2019 3:30 PM

  On
 Sat, Feb 23, 2019 at 4:06 PM Gunther <raj@xxxxxxxx> wrote:

    Hi, 

    I am using an SQL queue for distributing work to massively
      parallel workers.
You should look into 
specialized queueing software.

...

    I figured I might just pause all workers briefly to schedule the
      REINDEX Queue command, but the problem with this is that while the
      transaction volume is large, some jobs may take minutes to
      process, and in that case we need to wait minutes to quiet the
      database with then 47 workers sitting as idle capacity waiting for
      the 48th to finish so that the index can be rebuilt!
The
 jobs that take minutes are themselves the problem.  They prevent tuples
 from being cleaned up, meaning all the other jobs needs to grovel 
through the detritus every time they need to claim a new row.  If you 
got those long running jobs to end, you probably wouldn't even need to 
reindex--the problem would go away on its own as the dead-to-all tuples 
get cleaned up.

Locking a tuple and leaving the
 transaction open for minutes is going to cause no end of trouble on a 
highly active system.  You should look at a three-state method where the
 tuple can be pending/claimed/finished, rather than 
pending/locked/finished.  That way the process commits immediately after
 claiming the tuple, and then records the outcome in another transaction
 once it is done processing.  You will need a way to detect processes 
that failed after claiming a row but before finishing, but implementing 
that is going to be easier than all of this re-indexing stuff you are 
trying to do now.  You would claim the row by updating a field in it to 
have something distinctive about the process, like its hostname and pid,
 so you can figure out if it is still running when it comes time to 
clean up apparently forgotten entries.

Cheers,

Jeff