Re: UPDATE on two large datasets is very slow

Steve Gerhardt <ocean@xxxxxxxxxxxxxxxxx> · Tue, 03 Apr 2007 16:06:06 -0700

Tom Lane wrote:

You're focusing on the wrong thing --- there's nothing wrong with the plan.
It's only taking 9 seconds to perform the merge join.  The other 183
seconds are going somewhere else; you need to find out where.

One thing that came to mind was triggers, which would be shown in the
EXPLAIN results if you are using a sufficiently recent version of PG
(but you didn't say what you're using) ... however if this is a straight
port of MySQL code it's pretty unlikely to have either custom triggers
or foreign keys, so that is most likely the wrong guess.  It may just be
that it takes that long to update 26917 rows, which would suggest a
configuration problem to me.

Any suggestions for finding out where all the time is being spent? I'm 
running 8.2.0 by the way, bit boneheaded of me to not mention that in 
the original message, but I'm planning on upgrading to 8.2.3 soon. I 
don't have any triggers or other procedures set up that would interrupt 
this, which is why I'm really confused as to the enormous runtime. I 
agree that the merge join only takes 9 seconds, but it looks to me like 
the 183 seconds are spent sequential scanning both tables, then sorting 
the results, which I imagine would be necessary for the merge join to 
take place. Sadly, I'm not familiar enough with the internals to know if 
this is the case or not.

shared_buffers, wal_buffers, and checkpoint_segments seem like things
you might need to increase.

I'll try modifying those and report back with what kind of performance 
increases I can get.

Another problem with this approach is that it's not going to take long
before the table is bloated beyond belief, if it's not vacuumed
regularly.  Do you have autovacuum turned on?

Does the tracker tend to send a lot of null updates (no real change to
the rows)?  If so it'd be worth complicating the query to check for
no-change and avoid the update for unchanged rows.

The tracker is set up to run a VACUUM ANALYZE after each commit; I 
neglected to mention that. From the testing I've done, it seems like 
performance is more or less the same whether the table has been vacuumed 
recently. Also, the tracker specifically ignores null updates where no 
data is changed to cut down on the size of the data being sent.

Also, if you don't mind answering, I've been pretty puzzled why the two 
stored procedures are substantially slower than the original method, 
since the "concept" in my head seems like they would be a lot more 
simple. Am I missing something huge with the way Postgres works?

			regards, tom lane

Thanks for the help, Tom.

Steve Gerhardt