Re: batch update query performance

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On Tue, Apr 8, 2014 at 6:21 AM, Hans Drexler <Hans.Drexler@xxxxxxxxxxxxxxxxxx> wrote:
Dear Jeff, Albe and Heikki,

Let me start by thanking you for your time. It is really nice to have a
real supportive community. Thank you.

After reading the answers, we decided to do an experiment with a
fillfactor of 40% and dropping the index on the is_grc_002 field (but
retaining the other indexes.) The experiment showed a reduction in
run-time to ~125 seconds. That is almost 25 times faster than it was. We
are now doing more tests to verify this fix. We will send a SOLVED
message when the fix is verified (unless you state to not bother...)

We think we understand why the improvement works. Let me state our
understanding here. Please comment if we got it wrong.

Index entries point to record pages. An update on a row results in a new
row instance. If the new instance can be written in the same page as the
old instance, then no indexes need to be updated because the index still
points to the correct page. (Unless the update itself modifies an
indexed value). By specifying a fillfactor of 40%, there will be room
for an updated version of each row in the page.

This is mostly correct.  The index entry does not have *just* a page, it also has an offset to a slot on that page.  However, once it gets to the page there is a mechanism for chaining slots together, so you can still find the new version given the slot of an older version on the same page.  (If there were a way to have the index store *just* the page, then it would be even more useful for HOT, as then only the indexes for the values actually changed would need to get updates, as opposed to now where every index needs to be updated if any index needs to be updated.  But that would have other trade-offs)

 
We assume (sorry) that vacuuming the table will release the space of the
old rows, so that we can again do an update query and reuse the freed up
space in the pages.

Once no transaction can possibly be interested in the old version, then a vacuum can free it up for reuse.  In the special case of old HOT-updated tuples, any other process that happens to visit the page can also clean them up once they are old enough, not just vacuums.  

But if someone has a long running transaction open, even if that transaction never has and never will touch the table being vacuumed, it will still prevent the space from being reused.
 


Jeff, answering your question: The update is done after each cycle. It
will actually also update rows that were already updated before. We
realize this is actually wasteful.


So we might change

update t67cdi_nl_cmp_descr set is_grc_002='Y'

to

update t67cdi_nl_cmp_descr set is_grc_002='Y' where is_grc_002 is null

This will avoid creating new records for records that where already
changed before. This might give us additional speed improvement.


That will probably help a lot.  The HOT update code is smart enough to realize that changing from 'Y' to 'Y' does not prevent the HOT update from working, but it still needs to find room on the same page for a new copy of the tuple or else it cannot use the HOT mechanism anyway.  Once you add this restriction to the where clause, you might find that it is better to leave the index in place and put up with the index updates for those rows which actually do need to be updated, rather than keep dropping the and rebuilding the index.  It would depend on what proportion of the table is getting updated each time.

Cheers,

Jeff

[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux