Re: Performance of UPDATE SET = FROM vs UPDATE SET = (SELECT ...)

pbj@xxxxxxxxxx · Mon, 3 Nov 2014 12:36:46 -0800

  >
 > On Mon, 11/3/14, Igor Neyman <ineyman@xxxxxxxxxxxxxx> wrote:
 >
 >  -----Original Message-----
 >  From: pgsql-general-owner@xxxxxxxxxxxxxx
 >  [mailto:pgsql-general-owner@xxxxxxxxxxxxxx]  
 >  On Behalf Of pbj@xxxxxxxxxx
 >  Sent: Monday, November 03, 2014 11:34 AM
 >  To: pgsql-general@xxxxxxxxxxxxxx
 >  Subject: [SPAM] -  Performance of UPDATE SET = FROM
 >  vs UPDATE SET = (SELECT ...)
 >  
 >  Why does the UPDATE SET = FROM choose a more poorly
 >  performing plan than the UPDATE SET = (SELECT ...)?  It
 >  seems to me that it is the same join. 
 >
 >  I'm using 9.3.5.
 >
 >  CREATE TABLE orig
 >  (                   
 >          key1    VARCHAR(11)
 >  PRIMARY KEY,                   
 >          time1   TIME
 >  );
 >
 >  INSERT INTO orig (key1, time1)
 >  SELECT
 >          a::TEXT,
 >          (((random()*100)::INT %
 >  24)::TEXT || ':' ||
 >          ((random()*100)::INT %
 >  60)::TEXT)::TIME FROM generate_series(80000000000,
 >  80002000000) a;
 >
 >  CREATE INDEX odx ON orig(key1);
 >
 >  CREATE TABLE second (LIKE orig); 
 >
 >  INSERT INTO second (key1)  
 >          SELECT
 >  (80000000000+(((random()*1000000)::INT) % 1000000))::TEXT
 >          FROM generate_series(1,400000);
 >  
 >  EXPLAIN ANALYZE
 >  UPDATE second SET time1 = orig.time1
 >  FROM orig
 >  WHERE second.key1 = orig.key1;  
 >  
 [.....]
 >  
 >  UPDATE second SET time1 = NULL;
 >  
 >  EXPLAIN ANALYZE
 >  UPDATE second SET time1 = (SELECT orig.time1 FROM
 >  orig,second
 >
 >          WHERE orig.key1 = second.key1
 >  LIMIT 1);           
 >
 [.....]
 >          
 >  These 2 queries are not the same.

 >  
 >  The first query updates rows in the "second" table with the
 >  orig.time1 values based on key1 column match.
 >  The second query finds first possible match (based on key1
 >  column) and assigns orig.time1 value from the matched row to
 >  every record in "second" table.
 >  
 >  Regards,
 >  Igor Neyman

I see that now.  I was trying to reproduce something from work from
memory and got tripped up on a sublety of UPDATE ... SELECT.  The query
I ran at work was like this:

EXPLAIN ANALYZE
UPDATE second se SET time1 = (SELECT time1 FROM orig
                        WHERE orig.key1 = se.key1);

                                                          QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------
 Update on second se  (cost=0.00..3390627.00 rows=400000 width=18) (actual time=18698.795..18698.795 rows=0 loops=1)
   ->  Seq Scan on second se  (cost=0.00..3390627.00 rows=400000 width=18) (actual time=7.558..16694.600 rows=400000 loops=1)
         SubPlan 1
           ->  Index Scan using odx on orig  (cost=0.43..8.45 rows=1 width=8) (actual time=0.033..0.035 rows=1 loops=400000)
                 Index Cond: ((key1)::text = (se.key1)::text)
 Total runtime: 18698.865 ms
(6 rows)

This does correctly match and update all of the second table entries.
The plan actually runs longer than the UPDATE ... FROM, which squares
with a comment the fine manual.

Thanks!
PJ

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general