Re: Performance of UPDATE SET = FROM vs UPDATE SET = (SELECT ...)

pbj@xxxxxxxxxx · Fri, 7 Nov 2014 04:03:28 -0800

On Thu, Nov 06, 2014 at 02:55:20PM +0000, Shaun Thomas wrote:
> 
> These updates aren't equivalent. It's very important you know this, because you're also inflating your table with a lot of extra updated rows.
> 
> Take the first UPDATE:
> 
> > UPDATE second SET time1 = orig.time1
> > FROM orig
> > WHERE second.key1 = orig.key1;
> 
> If you wrote this as a SELECT, it would look like this:
> 
> SELECT second.time1, orig.time1
>   FROM second
>   JOIN ORIG ON (second.key1 = orig.key1)
> 
> Since second is a many to one subset of orig, you now have several simultaneous updates. Your second UPDATE:
> 
> > UPDATE second SET time1 = (SELECT orig.time1 FROM orig,second
> > WHERE orig.key1 = second.key1 LIMIT 1);
> 
> Is equivalent to this SELECT:
> 
> SELECT second.time1,
>        (SELECT orig.time1 FROM orig,second
>          WHERE orig.key1 = second.key1 LIMIT 1)
>   FROM second;
> 
> Meaning you'd only get as many updates as there are rows in second. The difference is your LIMIT 1. However, since you're not using an ORDER BY clause, the actual value you get for time1 will be indeterminate. Something like this would remove the row inflation and fix the random time1 behavior, but I'm not sure it was your intent:
> 
> UPDATE second
>    SET time1 = orig.time1
>   FROM (SELECT DISTINCT ON (key1) key1, time1
>           FROM orig
>          ORDER BY key1, time1 DESC) sub
>  WHERE second.key1 = sub.key1;

I see now that I made more than one mistake.

1) I forgot to INCLUDE INDEXES when creating second.  I would have
   seen dup keys when filling it.

	CREATE TABLE second (LIKE orig INCLUDING INDEXES);

2) I should have used something like this to fill second:

	INSERT INTO second (key1)
		SELECT key1 FROM orig
		ORDER BY random()
		LIMIT 400000;

3) I then incorrectly remembered the query I had written at work.  It
   should have been:

	EXPLAIN ANALYZE
	UPDATE second se SET time1 = (SELECT time1 FROM orig
				WHERE orig.key1 = se.key1);

Once the second table is filled with unique keys, then both UPDATES
should have produced the same results, but the UPDATE FROM is faster
than the UPDATE = SELECT, which is documented.

My original intent was to find out what the performance differences
between the two are.

Thanks for pointing these things out!
PJ

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general