this is strange. what means "run under transaction"? you did exactly statements in psql console: begin; explain analyze select ... commit? regards Pavel Stehule 2008/7/16 Viktor Rosenfeld <rosenfel@xxxxxxxxxxxxxxxxxxxxxxx>: > Hi Pavel, > > thanks for the advice on how to uncorrelate the query. I must admit I > didn't know about the UPDATE ... SET ... FROM ... syntax. > > Now the UPDATE runs in an acceptable time inside a transaction, however the > query plan still differs when I run it outside. > > Outside a transaction: > > QUERY PLAN > -------------------------------------------------------------------------------------------------------------------------------------------- > Merge Join (cost=0.00..11481.84 rows=65756 width=1087) (actual > time=0.151..323.856 rows=65756 loops=1) > Merge Cond: (_struct.id = tmp.id) > -> Index Scan using "_PK_struct" on _struct (cost=0.00..7084.50 > rows=98149 width=1083) (actual time=0.028..137.463 rows=32300 loops=1) > -> Index Scan using idx_tmp__id on tmp (cost=0.00..3330.02 rows=65756 > width=12) (actual time=0.115..58.601 rows=65756 loops=1) > Total runtime: 2905.580 ms > > This looks like an optimal plan and average run time over 5 runs is 2660 ms. > > Inside a transaction: > > QUERY PLAN > -------------------------------------------------------------------------------------------------------------------------------------------- > Merge Join (cost=7427.63..16159.84 rows=65756 width=1087) (actual > time=315.570..574.075 rows=65756 loops=1) > Merge Cond: (_struct.id = tmp.id) > -> Index Scan using "_PK_struct" on _struct (cost=0.00..7500.50 > rows=98149 width=1083) (actual time=0.020..129.915 rows=32300 loops=1) > -> Sort (cost=7427.63..7592.02 rows=65756 width=12) (actual > time=315.538..333.359 rows=65756 loops=1) > Sort Key: tmp.id > Sort Method: quicksort Memory: 4617kB > -> Seq Scan on tmp (cost=0.00..2165.56 rows=65756 width=12) > (actual time=10.070..37.411 rows=65756 loops=1) > Trigger for constraint _FK_struct_2_collection: time=1105.892 calls=32300 > Trigger for constraint _FK_struct_2_text: time=1468.009 calls=32300 > Total runtime: 4955.784 ms > > Again, the planner does not use the index on tmp (id) although I put an > "ANALYZE tmp" right before the UPDATE. Average run time over 5 runs is 4610 > ms. > > Thanks, > Viktor > > Am 16.07.2008 um 15:33 schrieb Pavel Stehule: > >> Hello >> >> my advice is little bit offtopic, I am sorry. Why you use correlated >> subquery? Your update statement should be >> >> update _struct set left_token = tmp.left_token from tmp where >> _struct.id = tmp.id; >> >> send output of explain analyze statement, please. etc >> explain analyze UPDATE _struct SET left_token = (SELECT DISTINCT >> left_token FROM tmp WHERE _struct.id = tmp.id) >> >> regards >> Pavel Stehule >> >> >> >> 2008/7/16 Viktor Rosenfeld <rosenfel@xxxxxxxxxxxxxxxxxxxxxxx>: >>> >>> Hi Tom, >>> Postgres is indeed selecting a bad plan. Turns out that the index I >>> created >>> to speed up the UPDATE isn't used inside a transaction block. >>> Here's the plan for "UPDATE _struct SET left_token = (SELECT DISTINCT >>> left_token FROM tmp WHERE _struct.id = tmp.id)" outside of a transaction: >>> QUERY PLAN >>> >>> >>> ------------------------------------------------------------------------------------------- >>> Seq Scan on _struct (cost=0.00..826643.13 rows=98149 width=1083) >>> SubPlan >>> -> Unique (cost=8.38..8.40 rows=1 width=4) >>> -> Sort (cost=8.38..8.39 rows=4 width=4) >>> Sort Key: tmp.left_token >>> -> Index Scan using idx_tmp__id on tmp (cost=0.00..8.34 >>> rows=4 width=4) >>> Index Cond: ($0 = id) >>> And inside a transaction: >>> QUERY PLAN >>> >>> >>> --------------------------------------------------------------------------------------- >>> Seq Scan on _struct (cost=100000000.00..3230175260746.00 rows=32300 >>> width=70) >>> SubPlan >>> -> Unique (cost=100002329.99..100002330.01 rows=1 width=4) >>> -> Sort (cost=100002329.99..100002330.00 rows=4 width=4) >>> Sort Key: tmp.left_token >>> -> Seq Scan on tmp (cost=100000000.00..100002329.95 >>> rows=4 width=4) >>> Filter: ($0 = id) >>> The high cost of the seqscan on tmp are because I tried disabling >>> sequential >>> scans inside the transaction to force an index scan, which Postgres >>> decided >>> to ignore in this case. >>> Putting an ANALYZE tmp and ANALYZE _struct right before the UPDATE didn't >>> help either. (Also shouldn't the creation of an index on tmp (id) take >>> care >>> of analyzing that column?) >>> Thanks, >>> Viktor >>> Am 14.07.2008 um 20:52 schrieb Tom Lane: >>> >>> Viktor Rosenfeld <rosenfel@xxxxxxxxxxxxxxxxxxxxxxx> writes: >>> >>> the script below runs very fast when executed alone. But when I call >>> >>> it from within a transaction block it's so slow that I have to abort >>> >>> it after a while. Specifically the second-to-last UPDATE seems to >>> >>> take forever within a transaction while it completes in about 3 >>> >>> seconds outside a transaction. >>> >>> Since the table you're working on was just created in the same >>> transaction, there's been no opportunity for autovacuum to run an >>> ANALYZE on it; that's probably preventing selection of a good plan. >>> Try throwing in an "ANALYZE tmp" after you load the table. >>> >>> regards, tom lane >>> >>> -- >>> Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) >>> To make changes to your subscription: >>> http://www.postgresql.org/mailpref/pgsql-general >>> >>> > >