Re: UPDATE on two large datasets is very slow

Tom Lane <tgl@xxxxxxxxxxxxx> · Tue, 03 Apr 2007 13:34:35 -0400

Steve Gerhardt <ocean@xxxxxxxxxxxxxxxxx> writes:
> # EXPLAIN ANALYZE UPDATE peers2...etc etc
> QUERY PLAN
> -------------------------------------------------------------------------
>   Merge Join  (cost=262518.76..271950.65 rows=14933 width=153) (actual
> time=8477.422..9216.893 rows=26917 loops=1)
>     Merge Cond: ((p.tid = t.tid) AND (p.uid = t.uid))
>     ->  Sort  (cost=177898.12..180004.09 rows=842387 width=65) (actual
> time=7803.248..8073.817 rows=109732 loops=1)
>           Sort Key: p.tid, p.uid
>           ->  Seq Scan on peers2 p  (cost=0.00..25885.87 rows=842387
> width=65) (actual time=0.043..4510.771 rows=647686 loops=1)
>     ->  Sort  (cost=84620.64..85546.64 rows=370400 width=96) (actual
> time=641.438..761.893 rows=55393 loops=1)
>           Sort Key: t.tid, t.uid
>           ->  Seq Scan on temp_p2 t  (cost=0.00..11112.00 rows=370400
> width=96) (actual time=0.093..275.110 rows=55393 loops=1)
>   Total runtime: 192569.492 ms
> (9 rows)

> Essentially, it looks like what it's doing is sorting both tables on the
> WHERE clause, then finding which positions correspond between the two.

You're focusing on the wrong thing --- there's nothing wrong with the plan.
It's only taking 9 seconds to perform the merge join.  The other 183
seconds are going somewhere else; you need to find out where.

One thing that came to mind was triggers, which would be shown in the
EXPLAIN results if you are using a sufficiently recent version of PG
(but you didn't say what you're using) ... however if this is a straight
port of MySQL code it's pretty unlikely to have either custom triggers
or foreign keys, so that is most likely the wrong guess.  It may just be
that it takes that long to update 26917 rows, which would suggest a
configuration problem to me.

> Anyway, I admit I haven't done a great deal of configuration file tuning
> for the Postgres setup,

shared_buffers, wal_buffers, and checkpoint_segments seem like things
you might need to increase.

Another problem with this approach is that it's not going to take long
before the table is bloated beyond belief, if it's not vacuumed
regularly.  Do you have autovacuum turned on?

Does the tracker tend to send a lot of null updates (no real change to
the rows)?  If so it'd be worth complicating the query to check for
no-change and avoid the update for unchanged rows.

			regards, tom lane