Re: Why does the query planner use two full indexes, when a dedicated partial index exists?

Richard Neill <rjn@xxxxxxxxxxxxxxxx> · Thu, 27 Dec 2012 10:49:58 +0000

        The partial index is highly leveraged.  If every tuple in the
        table is updated once, that amounts to every tuple in the index
>         being updated 25,000 times.

    How so? That sounds like O(n_2) behaviour.

If the table has 5 million rows while the index has 200 (active) rows at
any given time, then to update every row in the table to null and back
again would be 100% turn over of the table.  But each such change would
lead to an addition and then a deletion from the index.  So 100%
turnover of the table would be a 5 million / 200 = 25,000 fold turn of
the index.

Sorry, I was being dense. I misread that as:
   "every time a single tuple in the table is updated, the entire index
    (every row) is updated".
Yes, of course your explanation makes sense.

There is some code that allows a btree index entry to get killed (and so
the slot to be reused) without any vacuum, if a scan follows that entry
and finds the corresponding tuple in the table no longer visible to
anyone.  I have not examined this code, and don't know whether it is
doing its job but just isn't enough to prevent the bloat, or if for some
reason it is not applicable to your situation.

It looks like my solution is going to be a REINDEX invoked from cron, or 
maybe just every 100k inserts.

In terms of trying to improve this behaviour for other PG users in the 
future, are there any more diagnostics I can do for you? Having found a 
special case, I'd like to help permanently resolve it if I can.

Thanks very much again.

Best wishes,

Richard

--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance