second concurrent update takes forever

Janning Vygen <vygen@xxxxxxxxxxx> · Mon, 8 Feb 2010 18:10:08 +0100

Hi folks,

I don't need this list very often because postgresql works like a charm! But 
today we encountered a rather complicated puzzle for us. We really need your 
help!

we are using postgresql 8.4 on a debian lenny with latest security patches 
applied.

We are running a rather complicated Update statement from time to time which 
is only triggered by administrators. The statement updates about 50.000 rows. 
It takes usually about 10-30 seconds to execute and that's fine for us. 

This time two administrator did run this update at approximately  the same 
time, so the second update started before the first finished.

The first update took about 30 seconds which is quite long but not a problem 
for us. the second update was useless, because everything was already updated. 
Anyway, the second statement took 5 hours!

We were able to reproduce this scenario very easily. We had a local database, 
switched off auto-vacuum daemon and had no other concurrent tasks.

We just ran the statement and while it was running we started the very same 
statement in a different database session. the first statement finished in 3 
seconds, the second statement never returned (we canceled it after an hour or 
so).

so we read the documentation about transaction isolation
http://www.postgresql.org/docs/current/static/transaction-iso.html
we are in read committed isolation. 

It says about UPDATE statements "The search condition of the command (the 
WHERE clause) is re-evaluated to see if the updated version of the row still 
matches the search condition."

the update statement has an additional from clause, I show you slightly  
simplified version of it (to make it more readable)

UPDATE 
   queue 
SET 
   queue_status =
      CASE WHEN status = 0 OR status = 2 
      THEN status + 1 
      ELSE status  
      END, 
   queue_index = 
      CASE WHEN status = 0 OR status = 2 
      THEN updates.index 
      ELSE
        CASE WHEN queue.index >  updates.index 
        THEN updates.index 
        ELSE queue.index 
        END 
      END 
FROM 
( 
   SELECT 
      matchday.group_id AS group_id, 
      min (matchday.index) AS index 
   FROM event NATURAL LEFT JOIN matchday
   WHERE event.event_id IN ( 3033445 ) 
   GROUP BY matchday.group_id 
) AS updates 
WHERE 
   queue.group_id = updates.group_id
;

so the essence is: the statement has a rather long executing sub-select 
statement which takes about a 800ms. I "suspect" postgresql to do the sub-
selection and while updating the first row of this sub-select it sees that this 
row was updated since the update statement (or the transaction) was started. 
And then it re-checks the condition just for this row (or maybe for a bunch of 
rows on the same page). if it comes to updating the next row it happens again.

re-checking 40000+ rows with a very expensive sub-select plan is not good at 
all. 40000 times  800 ms = 10 hours. 

Am I right with my suspicion?
If yes, how can I fix it?

kind regards
Janning

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general