Re: cannot use multicolumn index

"Tomas Vondra" <tv@xxxxxxxx> · Wed, 14 Sep 2011 18:05:23 +0200

On 14 Září 2011, 17:14, MirrorX wrote:
> thx for the answer.
>
> - What is the problem, i.e. what behaviour you expect?
> - How much data is the table?
> - What portion of it matches the conditions?
> - What is the index definition?
>
> i think in my first post i provided most of these details but ->

Hmmm, I haven't received that post and I don't see that in the archives:

http://archives.postgresql.org/pgsql-performance/2011-09/msg00210.php

It's displayed on nabble.com, but it's marked as 'not yet accepted'.
That's strange.

Anyway there's still a lot of missing info - what version of PostgreSQL is
this? What is the table structure, what indexes are there?

> 1) what i expect is to be able to understand why the index is not used and
> if possibly to use it somehow, or recreate it in a better way
> 2) the table has 115 GB and about 700 milion rows

Really? Because the explain analyze output you posted states there are
just 5.760.724 rows, not 700.000.000.

> 3) the result should be less than 10 millions rows

That's about 1.5% of the rows, but it may be much larger portion of the
table. The table is stored by blocks - whenever you need to read a row,
you need to read the whole block.

115GB is about 15.073.280 blocks (8kB). If each row happens to be stored
in a different block, you'll have to read about 66% of blocks (although
you need just 1.4% of rows).

Sure, in reality the assumption 'a different block for each row' is not
true, but with a table this large the block probably won't stay in the
cache (and thus will be read repeatedly from the device).

And that's just the table - you have to read the index too (which is 35GB
in this case).

So it's not just about the 'row selectivity', it's about 'block
selectivity' too.

In short - my guess is the seq scan will be more efficient in this case,
but it's hard to prove without the necessary info.

> 4) the index is a btree

Great, but what are the columns? What data types are used?

BTW I've noticed you stated this in the first post "i have read in the
manual that the multicolumn index can be used only if the clauses of the
query are in the same order as the columns of the index".

That's not true since 8.1, so unless you're using a very old version of
PostgreSQL (8.0 or older), you may use whatever columns you want although
it's not as efficient.

Do you need both columns (xid, xdate) in the WHERE condition, or have you
used one of them just to fulfill the 'leftmost columns' rule by adding a
condition that matches everything? If that's the case, it's hardly going
to improve the effectivity.

I see two possible solutions:

1) partition the table and use constraint_exclusion so that just a small
portion of the table is scanned - there are pros/cons of this solution

2) cluster the table by one of the columns, so that an index scan may be
more effective (but this might hurt other queries and you'll have to do
that repeatedly)

Tomas

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance