Re: UPDATE runs slow in a transaction

Viktor Rosenfeld <rosenfel@xxxxxxxxxxxxxxxxxxxxxxx> · Wed, 16 Jul 2008 17:30:56 +0200

Yes, I either run

begin;
... lots of other stuff in a script
explain analyze update ... (called from the same script)
rollback;

or

... lots of other stuff in a script (same as above)
explain analyze update ...

Cheers,
Viktor

Am 16.07.2008 um 16:58 schrieb Pavel Stehule:

this is strange.

what means "run under transaction"?

you did exactly statements in psql console:
begin;
explain analyze select ...
commit?

regards
Pavel Stehule

2008/7/16 Viktor Rosenfeld <rosenfel@xxxxxxxxxxxxxxxxxxxxxxx>:
Hi Pavel,

thanks for the advice on how to uncorrelate the query.  I must  
admit I
didn't know about the UPDATE ... SET ... FROM ... syntax.

Now the UPDATE runs in an acceptable time inside a transaction,  
however the
query plan still differs when I run it outside.

Outside a transaction:

QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------
Merge Join  (cost=0.00..11481.84 rows=65756 width=1087) (actual
time=0.151..323.856 rows=65756 loops=1)
 Merge Cond: (_struct.id = tmp.id)
 ->  Index Scan using "_PK_struct" on _struct  (cost=0.00..7084.50
rows=98149 width=1083) (actual time=0.028..137.463 rows=32300  
loops=1)
 ->  Index Scan using idx_tmp__id on tmp  (cost=0.00..3330.02  
rows=65756
width=12) (actual time=0.115..58.601 rows=65756 loops=1)
Total runtime: 2905.580 ms

This looks like an optimal plan and average run time over 5 runs is  
2660 ms.

Inside a transaction:

QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------
Merge Join  (cost=7427.63..16159.84 rows=65756 width=1087) (actual
time=315.570..574.075 rows=65756 loops=1)
 Merge Cond: (_struct.id = tmp.id)
 ->  Index Scan using "_PK_struct" on _struct  (cost=0.00..7500.50
rows=98149 width=1083) (actual time=0.020..129.915 rows=32300  
loops=1)
 ->  Sort  (cost=7427.63..7592.02 rows=65756 width=12) (actual
time=315.538..333.359 rows=65756 loops=1)
       Sort Key: tmp.id
       Sort Method:  quicksort  Memory: 4617kB
       ->  Seq Scan on tmp  (cost=0.00..2165.56 rows=65756 width=12)
(actual time=10.070..37.411 rows=65756 loops=1)
Trigger for constraint _FK_struct_2_collection: time=1105.892  
calls=32300
Trigger for constraint _FK_struct_2_text: time=1468.009 calls=32300
Total runtime: 4955.784 ms

Again, the planner does not use the index on tmp (id) although I  
put an
"ANALYZE tmp" right before the UPDATE.  Average run time over 5  
runs is 4610
ms.

Thanks,
Viktor

Am 16.07.2008 um 15:33 schrieb Pavel Stehule:

Hello

my advice is little bit offtopic, I am sorry. Why you use correlated
subquery? Your update statement should be

update _struct set left_token = tmp.left_token from tmp where
_struct.id = tmp.id;

send output of explain analyze statement, please. etc
explain analyze UPDATE _struct SET left_token = (SELECT DISTINCT
left_token FROM tmp WHERE _struct.id = tmp.id)

regards
Pavel Stehule

2008/7/16 Viktor Rosenfeld <rosenfel@xxxxxxxxxxxxxxxxxxxxxxx>:

Hi Tom,
Postgres is indeed selecting a bad plan.  Turns out that the  
index I
created
to speed up the UPDATE isn't used inside a transaction block.
Here's the plan for "UPDATE _struct SET left_token = (SELECT  
DISTINCT
left_token FROM tmp WHERE _struct.id = tmp.id)" outside of a  
transaction:
                                     QUERY PLAN

-------------------------------------------------------------------------------------------
Seq Scan on _struct  (cost=0.00..826643.13 rows=98149 width=1083)
SubPlan
  ->  Unique  (cost=8.38..8.40 rows=1 width=4)
        ->  Sort  (cost=8.38..8.39 rows=4 width=4)
              Sort Key: tmp.left_token
              ->  Index Scan using idx_tmp__id on tmp   
(cost=0.00..8.34
rows=4 width=4)
                    Index Cond: ($0 = id)
And inside a transaction:
                                   QUERY PLAN

---------------------------------------------------------------------------------------
Seq Scan on _struct  (cost=100000000.00..3230175260746.00  
rows=32300
width=70)
SubPlan
  ->  Unique  (cost=100002329.99..100002330.01 rows=1 width=4)
        ->  Sort  (cost=100002329.99..100002330.00 rows=4 width=4)
              Sort Key: tmp.left_token
              ->  Seq Scan on tmp  (cost=100000000.00..100002329.95
rows=4 width=4)
                    Filter: ($0 = id)
The high cost of the seqscan on tmp are because I tried disabling
sequential
scans inside the transaction to force an index scan, which Postgres
decided
to ignore in this case.
Putting an ANALYZE tmp and ANALYZE _struct right before the  
UPDATE didn't
help either.  (Also shouldn't the creation of an index on tmp  
(id) take
care
of analyzing that column?)
Thanks,
Viktor
Am 14.07.2008 um 20:52 schrieb Tom Lane:

Viktor Rosenfeld <rosenfel@xxxxxxxxxxxxxxxxxxxxxxx> writes:

the script below runs very fast when executed alone.  But when I  
call

it from within a transaction block it's so slow that I have to  
abort

it after a while.  Specifically the second-to-last UPDATE seems to

take forever within a transaction while it completes in about 3

seconds outside a transaction.

Since the table you're working on was just created in the same
transaction, there's been no opportunity for autovacuum to run an
ANALYZE on it; that's probably preventing selection of a good plan.
Try throwing in an "ANALYZE tmp" after you load the table.

regards, tom lane

--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general