Re: Why does the query planner use two full indexes, when a dedicated partial index exists?

Richard Neill <rn214@xxxxxxxxxxxxxxxx> · Sat, 22 Dec 2012 17:29:03 +0000

    I've now installed 9.2. As you said, thanks to the change in 9.2 it
    initially prefers the partial index.

    BUT, after 1 cycle of inserting 500k rows, then deleting them all,
    then starting to insert again, I find that the planner has reverted
    to the former bad behaviour.

Presumably the real work load has this type of turn over happen one row
at a time, rather than all in one giant mass update transaction, right?
  That makes a big difference in the way space is re-used.

Sorry - I meant a "real" workload here. I replayed a whole day's worth 
of real data into the DB, and that's what I meant by a cycle. Everything 
was row-at-a-time.
(It currently takes about an hour to do this)

    Reindexing only takes a couple of seconds, and restores correctness.

Even your slow query is pretty fast.  If you can't afford that, can you
afford to take an exclusive lock for a couple of seconds every few minutes?

Yes, I can. If that's the root cause, I'll do that. But it seems to me 
that I've stumbled upon some rather awkward behaviour that I need to 
understand fully, and if the index is bloating that badly and that 
quickly, then perhaps it's a PG bug (or at least cause for a logfile 
warning).

BTW, The index has gone from 16kB to 4.5MB in 6 hours of runtime today. 
It still only has 252 matching rows.

    What's going on? Do I need to run reindex in a cron-job? I thought
    that reindex wasn't "normally" needed, and that index bloat happened
    only after every row had changed value hundreds of times.

The partial index is highly leveraged.  If every tuple in the table is
updated once, that amounts to every tuple in the index being updated
25,000 times.

How so? That sounds like O(n_2) behaviour.

For the same reason, it is probably not getting vacuum often enough.
  The default settings have the table vacuumed once 20% of its rows
turns over, but that means the partial index has been turned over many
many times.  You could crank down the auto-vacuum settings for that
table, or run manual vacuum with a cron job.

Vacuum will not unbloat the index, but if you run it often enough it
will keep the bloat from getting too bad in the first place.

Thanks. I've reduced  autovacuum_vacuum_scale_factor from 0.2 to 0.05
(and set autovacuum_analyze_scale_factor = 0.05 for good measure)

As I understand it, both of these can run in parallel, and I have 7 
cores usually idle, while the other is maxed out.

But what I think I'd do is change one of your full indexes to contain
the other column as well, and get rid of the partial index.  It might
not be quite as efficient as the partial index might theoretically be,
but it should be pretty good and also be less fragile.

I'll try that.

Thanks,

Richard

--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance