Re: Suggestions on an update query

Gregory Stark <stark@xxxxxxxxxxxxxxxx> · Fri, 26 Oct 2007 22:09:20 +0100

"Campbell, Lance" <lance@xxxxxxxx> writes:

>                                    QUERY PLAN
>
>
> ------------------------------------------------------------------------
>
>  Hash Join  (cost=1437.71..1046983.94 rows=17333178 width=32)
>    Hash Cond: (result_entry.fk_question_id = question_number.fk_question_id)
>    ->  Seq Scan on result_entry  (cost=0.00..612216.78 rows=17333178 width=28)
>    ->  Hash  (cost=927.87..927.87 rows=40787 width=8)
>          ->  Seq Scan on question_number  (cost=0.00..927.87 rows=40787 width=8)
>
> (5 rows)

That looks like an entirely reasonable plan. Is it possible some other session
was blocking this update with a lock on a record? Was there lots of I/O at the
time? You could peek in pg_locks while the update seems frozen.

This looks like a one-time administrative job to add a new column, is that it?
You might also consider creating a new table with the new data and replacing
the old table with the new one with something like:

CREATE TABLE new_result_entry AS 
  SELECT fk_result_submission_id, fk_question_id, fk_option_order_id, 
         value, order_id, 
         question_number.question_number 
    FROM result_entry
    JOIN question_number USING (fk_question_id)

CREATE INDEX result_entery_index1n ON new_result_entry USING btree (fk_question_id);

ALTER TABLE result_entry RENAME TO old_result_entry
ALTER TABLE newresult_entry RENAME TO result_entry

Unfortunately (for this use case) any views, triggers, etc which reference the
old table will continue to reference the old table after the renames. You'll
have to drop and recreate them.

That may not be an option if the data is actively being used though. But if it
is an option there are a few advantages 1) it'll be a bit faster 2) you can
build the indexes on the new data at the end of the creation b) the resulting
table and indexes won't have all the old versions taking up space waiting for
a vacuum.

> Postgresql.conf settings:
> shared_buffers = 1GB
> work_mem = 10MB
> max_fsm_pages = 204800
> random_page_cost = 1.0
> effective_cache_size = 8GB

I would suggest keeping random_page_cost at least slightly above 1.0 and
effective_cache_size should probably be about 6GB rather than 8 since the
shared buffers and other things which use memory reduce the memory available
for cache. Also, work_mem could be larger at least for large batch queries
like this.

None of this is relevant for this query though. Actually I think a larger
work_mem can avoid problems with hash joins so you might try that but I don't
think it would be choosing it estimated that might happen -- and the estimates
all look accurate.

-- 
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

               http://www.postgresql.org/docs/faq