Re: Long Running Update

"Kevin Grittner" <Kevin.Grittner@xxxxxxxxxxxx> · Fri, 24 Jun 2011 10:12:10 -0500

Harry Mantheakis <harry.mantheakis@xxxxxxxxxxxxxxxxxxxxxx> wrote:

>> It will be a lot faster if you can drop all indices...
> 
> This is counter-intuitive - because the WHERE clause is matching
> the only two indexed fields, and my understanding is that querying
> on indexed fields is faster than querying on fields that are not
> indexed.

Because your UPDATE requires reading every tuple in every page of
both tables, it would be counter-productive to use the indexes. 
Random access is much slower than sequential, so it's fastest to
just blast through both tables sequentially.  The plan you showed
has it scanning through table_B and loading the needed data into RAM
in a hash table, then scanning through table_A and updating each row
based on what is in the RAM hash table.

For each row updated, if it isn't a HOT update, a new entry must be
inserted into every index on table_A, so dropping the indexes before
the update and re-creating them afterward would probably be a very
good idea if you're going to do the whole table in one go, and
possibly even if you're working in smaller chunks.

One thing which might help run time a lot, especially since you
mentioned having a lot of unused RAM, is to run the update with a
very hight work_mem setting in the session running the UPDATE.

> (Note also, that the indexed field is NOT being updated.)

That's one of the conditions for a HOT update.  The other is that
there is space available on the same page for a new version of the
row, in addition to the old version.

-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance