Re: how to efficiently update tuple in many-to-many relationship?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Thanks for the suggestions, Tom. But I'm still stumped.

On Apr 9, 2007, at 7:13 PM, Tom Lane wrote:

Drew Wilson <drewmwilson@xxxxxxxxx> writes:
Here's the query plan for a SELECT statement that returns 1,207,161
rows in 6 seconds.
...
And here's the query plan for the UPDATE query that seems to never
complete. (Execution time > 30 minutes.)

Well, the subplan is certainly the same as before, so it seems there are
two possibilities:

* there's something unreasonably inefficient about the hash join being
used to perform the IN (work_mem too small? inefficient-to-compare
datatype? bad data distribution?)
I'm not sure why. The ids are OIDs generated from a sequence, with no deletions.

* the time is actually going into the UPDATE operation proper, or
perhaps some triggers it fires (have you got any foreign keys involving
this table?  what's checkpoint_segments set to?)

You could narrow it down by checking the runtime for
select count(*) from translation_pair_data
  where translation_pair_id in
        (select translation_pair_id from translation_pair_data ...
After I increasing work_mem from 1M to 32M, checkpoint_segments from 3 to 8, (and reloading), the UPDATE operation still takes about 15 minutes (944 seconds) to update 637,712 rows.

Whereas replacing the the "UPDATE ... WHERE translation_pair_id IN" with "SELECT count(*) WHERE translation_pair_id IN" drops the time from 15 minutes to 19 seconds (returning the same 637712 rows.)

If that's slow it's the topmost hash join's fault, else we have
to look at the UPDATE's side effects.

The SELECT is not slow, so its a side effect of the update... Looking at the table definition, there is a "BEFORE ON DELETE" trigger defined, two CHECK constraints for this table, and three foreign keys. Nothing looks suspicious to me.
Any clues in the table description below?

Here's the table definition. (And I've appended updated query plans descriptions.)

MatchBox=# \d translation_pair_data Table "public.translation_pair_data"
       Column        |            Type             |   Modifiers
---------------------+-----------------------------+---------------
translation_pair_id | oid                         | not null
translation_id      | oid                         | not null
history_id          | oid                         | not null
source_id           | oid                         | not null
created_ts          | timestamp without time zone | default now()
last_added_ts       | timestamp without time zone | default now()
obsolete            | boolean                     |
style               | character(1)                |
localizability      | boolean                     |
ui_restricted       | boolean                     |
linguistic          | boolean                     |
gender              | character(1)                |
platforms           | character varying[]         |
is_public           | boolean                     |
Indexes:
    "translation_pair_pkey" PRIMARY KEY, btree (translation_pair_id)
"translation_pair_source_id_key" UNIQUE, btree (source_id, translation_id)
    "translation_pair_created_date" btree (date(created_ts))
    "translation_pair_data_is_public" btree (is_public)
    "translation_pair_source_id" btree (source_id)
"translation_pair_source_id_is_not_obsolete" btree (source_id, obsolete) WHERE obsolete IS NOT TRUE
    "translation_pair_translation_id" btree (translation_id)
Check constraints:
"translation_pair_gender_check" CHECK (gender = 'M'::bpchar OR gender = 'F'::bpchar OR gender = 'N'::bpchar) "translation_pair_style_check" CHECK (style = 'P'::bpchar OR style = 'O'::bpchar OR style = 'N'::bpchar)
Foreign-key constraints:
"translation_pair_history_id_fkey" FOREIGN KEY (history_id) REFERENCES history(history_id) "translation_pair_source_id_fkey" FOREIGN KEY (source_id) REFERENCES source_data(source_id) "translation_pair_translation_id_fkey" FOREIGN KEY (translation_id) REFERENCES translation_data(translation_id)
Triggers:
del_tp_prodtype BEFORE DELETE ON translation_pair_data FOR EACH ROW EXECUTE PROCEDURE eme_delete_tp_prodtype()


Thanks for all your help,

Drew

p.s. here are the updated query plans after bumping work_mem to 32M.

MatchBox=# explain select count(*) from translation_pair_data where translation_pair_id in (select translation_pair_id from translation_pair_data join instance i using (translation_pair_id) join loc_submission ls using(loc_submission_id) where ls.is_public = true); QUERY PLAN ------------------------------------------------------------------------ -------------------------------------------------------------------
Aggregate  (cost=424978.46..424978.47 rows=1 width=0)
   ->  Hash IN Join  (cost=324546.91..420732.64 rows=1698329 width=0)
Hash Cond: (public.translation_pair_data.translation_pair_id = public.translation_pair_data.translation_pair_id) -> Seq Scan on translation_pair_data (cost=0.00..38494.29 rows=1698329 width=4)
         ->  Hash  (cost=290643.93..290643.93 rows=2006718 width=8)
-> Hash Join (cost=66710.78..290643.93 rows=2006718 width=8) Hash Cond: (i.translation_pair_id = public.translation_pair_data.translation_pair_id) -> Hash Join (cost=352.38..169363.36 rows=2006718 width=4) Hash Cond: (i.loc_submission_id = ls.loc_submission_id) -> Seq Scan on instance i (cost=0.00..99016.16 rows=5706016 width=8) -> Hash (cost=283.23..283.23 rows=5532 width=4) -> Index Scan using loc_submission_is_public on loc_submission ls (cost=0.00..283.23 rows=5532 width=4)
                                       Index Cond: (is_public = true)
                                       Filter: is_public
-> Hash (cost=38494.29..38494.29 rows=1698329 width=4) -> Seq Scan on translation_pair_data (cost=0.00..38494.29 rows=1698329 width=4)

The SELECT above takes approx 20s, whereas this UPDATE below takes 944s (15 minutes)

MatchBox=# explain update translation_pair_data set is_public = true where translation_pair_id in (select translation_pair_id from translation_pair_data join instance i using(translation_pair_id) join loc_submission ls using(loc_submission_id) where ls.is_public = true);
                                                             QUERY PLAN
------------------------------------------------------------------------ -------------------------------------------------------------
Hash IN Join  (cost=324546.91..457218.64 rows=1698329 width=90)
Hash Cond: (public.translation_pair_data.translation_pair_id = public.translation_pair_data.translation_pair_id) -> Seq Scan on translation_pair_data (cost=0.00..38494.29 rows=1698329 width=90)
   ->  Hash  (cost=290643.93..290643.93 rows=2006718 width=8)
         ->  Hash Join  (cost=66710.78..290643.93 rows=2006718 width=8)
Hash Cond: (i.translation_pair_id = public.translation_pair_data.translation_pair_id) -> Hash Join (cost=352.38..169363.36 rows=2006718 width=4) Hash Cond: (i.loc_submission_id = ls.loc_submission_id) -> Seq Scan on instance i (cost=0.00..99016.16 rows=5706016 width=8)
                     ->  Hash  (cost=283.23..283.23 rows=5532 width=4)
-> Index Scan using loc_submission_is_public on loc_submission ls (cost=0.00..283.23 rows=5532 width=4)
                                 Index Cond: (is_public = true)
                                 Filter: is_public
               ->  Hash  (cost=38494.29..38494.29 rows=1698329 width=4)
-> Seq Scan on translation_pair_data (cost=0.00..38494.29 rows=1698329 width=4)



[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux