Re: Running update in chunks?

Jeff Janes <jeff.janes@xxxxxxxxx> · Fri, 25 Jan 2013 09:58:54 -0800

On Fri, Jan 25, 2013 at 3:38 AM, Tim Uckun <timuckun@xxxxxxxxx> wrote:
>>
>> That suggests (to me, at least) that it is related to index updating. Again,
>> your GIN index seems primary candidate.
>>
>> Try running iostat (I think that's available on a Mac) with/without the
>> fillfactor and with/without the GIN index while you do the updates. It's
>> possible your SSD is just behaving oddly under stress.
>>
>
>
> I dropped the index and the numbers shot up tenfold or more.  I don't
> know why postgres feels the need to update the GIN index on the hstore
> field when I am only updating an integer field

When the row gets updated, it might move to some place else.  An index
maps data values to row locations.  So if the location changes, all
indexes need to be updated, even if the data value for that index did
not change.  (Well, I shouldn't say they *need* to change.  The
database could have been designed, with considerable difficulty and
consequences, to leave behind permanent redirect pointers to the new
location.  But it wasn't)

There is a mechanism called HOT update (Heap-Only Tuple) which can
prevent this under certain conditions.

1) Either none of the fields being updated are indexed, or any that
are both updated and indexed are updated to the value they already
have.

2) There is room for a new copy of the tuple on the same page as the old one.

lowering the fillfactor helps with requirement 2, especially since
your tuples are probably wide (because of the hstore column) and so
not many fit on a page.

Note that if you update a field to have the same value as it already
does, it still makes a new copy of the entire tuple anyway.  (It
detects that the :old = :new for HOT-eligibility purposes if the field
is indexed, but not for suppression of copying purposes.  And if the
tuple needs to be copied but there is no room on that page, then it
isn't eligible for HOT after all).

So you should add a where clause to the UPDATE to filter out things
that are unchanged.

> but it looks like I
> need to split the hstore into a different table.

That would be one solution, but I think a better one would be to not
store "make_id" in "imports" in the first place, but instead to always
fetch it by joining "imports" to "models" at query time.

Cheers,

Jeff

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general