Re: how to efficiently update tuple in many-to-many relationship?

Drew Wilson <drewmwilson@xxxxxxxxx> · Mon, 9 Apr 2007 22:46:29 -0700

Thanks for the suggestions, Tom. But I'm still stumped.

On Apr 9, 2007, at 7:13 PM, Tom Lane wrote:

Drew Wilson <drewmwilson@xxxxxxxxx> writes:
Here's the query plan for a SELECT statement that returns 1,207,161
rows in 6 seconds.
...
And here's the query plan for the UPDATE query that seems to never
complete. (Execution time > 30 minutes.)

Well, the subplan is certainly the same as before, so it seems  
there are
two possibilities:

* there's something unreasonably inefficient about the hash join being
used to perform the IN (work_mem too small? inefficient-to-compare
datatype? bad data distribution?)
I'm not sure why. The ids are OIDs generated from a sequence, with no  
deletions.

* the time is actually going into the UPDATE operation proper, or
perhaps some triggers it fires (have you got any foreign keys  
involving
this table?  what's checkpoint_segments set to?)

You could narrow it down by checking the runtime for
select count(*) from translation_pair_data
  where translation_pair_id in
        (select translation_pair_id from translation_pair_data ...
After I increasing work_mem from 1M to 32M, checkpoint_segments from  
3 to 8, (and reloading), the UPDATE operation still takes about 15  
minutes (944 seconds) to update 637,712 rows.

Whereas replacing the the "UPDATE ... WHERE translation_pair_id IN"  
with "SELECT count(*) WHERE translation_pair_id IN" drops the time  
from 15 minutes to 19 seconds (returning the same 637712 rows.)

If that's slow it's the topmost hash join's fault, else we have
to look at the UPDATE's side effects.

The SELECT is not slow, so its a side effect of the update... Looking  
at the table definition, there is a "BEFORE ON DELETE" trigger  
defined, two CHECK constraints for this table, and three foreign  
keys. Nothing looks suspicious to me.
Any clues in the table description below?

Here's the table definition. (And I've appended updated query plans  
descriptions.)

MatchBox=# \d translation_pair_data               Table  
"public.translation_pair_data"
       Column        |            Type             |   Modifiers
---------------------+-----------------------------+---------------
translation_pair_id | oid                         | not null
translation_id      | oid                         | not null
history_id          | oid                         | not null
source_id           | oid                         | not null
created_ts          | timestamp without time zone | default now()
last_added_ts       | timestamp without time zone | default now()
obsolete            | boolean                     |
style               | character(1)                |
localizability      | boolean                     |
ui_restricted       | boolean                     |
linguistic          | boolean                     |
gender              | character(1)                |
platforms           | character varying[]         |
is_public           | boolean                     |
Indexes:
    "translation_pair_pkey" PRIMARY KEY, btree (translation_pair_id)
    "translation_pair_source_id_key" UNIQUE, btree (source_id,  
translation_id)
    "translation_pair_created_date" btree (date(created_ts))
    "translation_pair_data_is_public" btree (is_public)
    "translation_pair_source_id" btree (source_id)
    "translation_pair_source_id_is_not_obsolete" btree (source_id,  
obsolete) WHERE obsolete IS NOT TRUE
    "translation_pair_translation_id" btree (translation_id)
Check constraints:
    "translation_pair_gender_check" CHECK (gender = 'M'::bpchar OR  
gender = 'F'::bpchar OR gender = 'N'::bpchar)
    "translation_pair_style_check" CHECK (style = 'P'::bpchar OR  
style = 'O'::bpchar OR style = 'N'::bpchar)
Foreign-key constraints:
    "translation_pair_history_id_fkey" FOREIGN KEY (history_id)  
REFERENCES history(history_id)
    "translation_pair_source_id_fkey" FOREIGN KEY (source_id)  
REFERENCES source_data(source_id)
    "translation_pair_translation_id_fkey" FOREIGN KEY  
(translation_id) REFERENCES translation_data(translation_id)
Triggers:
    del_tp_prodtype BEFORE DELETE ON translation_pair_data FOR EACH  
ROW EXECUTE PROCEDURE eme_delete_tp_prodtype()

Thanks for all your help,

Drew

p.s. here are the updated query plans after bumping work_mem to 32M.

MatchBox=# explain select count(*) from  
translation_pair_data                                                    

                            where translation_pair_id in (select  
translation_pair_id from translation_pair_data join instance i using 
(translation_pair_id) join loc_submission ls using(loc_submission_id)  
where ls.is_public = true);

QUERY PLAN
------------------------------------------------------------------------ 
-------------------------------------------------------------------
Aggregate  (cost=424978.46..424978.47 rows=1 width=0)
   ->  Hash IN Join  (cost=324546.91..420732.64 rows=1698329 width=0)
         Hash Cond:  
(public.translation_pair_data.translation_pair_id =  
public.translation_pair_data.translation_pair_id)
         ->  Seq Scan on translation_pair_data  (cost=0.00..38494.29  
rows=1698329 width=4)
         ->  Hash  (cost=290643.93..290643.93 rows=2006718 width=8)
               ->  Hash Join  (cost=66710.78..290643.93 rows=2006718  
width=8)
                     Hash Cond: (i.translation_pair_id =  
public.translation_pair_data.translation_pair_id)
                     ->  Hash Join  (cost=352.38..169363.36  
rows=2006718 width=4)
                           Hash Cond: (i.loc_submission_id =  
ls.loc_submission_id)
                           ->  Seq Scan on instance i   
(cost=0.00..99016.16 rows=5706016 width=8)
                           ->  Hash  (cost=283.23..283.23 rows=5532  
width=4)
                                 ->  Index Scan using  
loc_submission_is_public on loc_submission ls  (cost=0.00..283.23  
rows=5532 width=4)
                                       Index Cond: (is_public = true)
                                       Filter: is_public
                     ->  Hash  (cost=38494.29..38494.29 rows=1698329  
width=4)
                           ->  Seq Scan on translation_pair_data   
(cost=0.00..38494.29 rows=1698329 width=4)

The SELECT above takes approx 20s, whereas this UPDATE below takes  
944s (15 minutes)

MatchBox=# explain update translation_pair_data set is_public = true  
where translation_pair_id in (select translation_pair_id from  
translation_pair_data join instance i using(translation_pair_id) join  
loc_submission ls using(loc_submission_id) where ls.is_public = true);
                                                             QUERY PLAN
------------------------------------------------------------------------ 
-------------------------------------------------------------
Hash IN Join  (cost=324546.91..457218.64 rows=1698329 width=90)
   Hash Cond: (public.translation_pair_data.translation_pair_id =  
public.translation_pair_data.translation_pair_id)
   ->  Seq Scan on translation_pair_data  (cost=0.00..38494.29  
rows=1698329 width=90)
   ->  Hash  (cost=290643.93..290643.93 rows=2006718 width=8)
         ->  Hash Join  (cost=66710.78..290643.93 rows=2006718 width=8)
               Hash Cond: (i.translation_pair_id =  
public.translation_pair_data.translation_pair_id)
               ->  Hash Join  (cost=352.38..169363.36 rows=2006718  
width=4)
                     Hash Cond: (i.loc_submission_id =  
ls.loc_submission_id)
                     ->  Seq Scan on instance i   
(cost=0.00..99016.16 rows=5706016 width=8)
                     ->  Hash  (cost=283.23..283.23 rows=5532 width=4)
                           ->  Index Scan using  
loc_submission_is_public on loc_submission ls  (cost=0.00..283.23  
rows=5532 width=4)
                                 Index Cond: (is_public = true)
                                 Filter: is_public
               ->  Hash  (cost=38494.29..38494.29 rows=1698329 width=4)
                     ->  Seq Scan on translation_pair_data   
(cost=0.00..38494.29 rows=1698329 width=4)