Re: Confusion about composite indexes

Bill Mitchell <bill@xxxxxxxxxxxxxxx> · Mon, 21 May 2012 17:41:37 -0400

Thanks to everybody's input -- as a first-time poster to this listserv,
I wasn't sure how long it would take to get a response. ;)

I was frankly astonished to see that the composite index on (a,b) was
used when I searched for (a), but Chris' response makes total sense.

In this case, I don't want to go with a MAP due to the fact that I'm
actually using Java Hibernate to generate this schema and access it.

My sample query of using RANDOM() to select a random subset of the
overall outlets was actually to try and defeat any prior caching of
results, and give a more reasonable measurement -- but I didn't realize
the implications.  I had thought that coupled with a MAX clause at the
end it would simply randomize and then bail out early instead of a full
table scan - so thanks to Merlin for pointing that out.

I'll go with a 2nd index on MEDIA_ID and do some measurements of speed
increase, but it makes a lot more sense now.

thank you Postgres gurus!  :D

regards
Bill

On 5/21/12 5:11 PM, Merlin Moncure wrote:
> On Mon, May 21, 2012 at 3:36 PM, Dmitriy Igrishin <dmitigr@xxxxxxxxx> wrote:
>>> So you can get fully index lookups on all of a, b, ab, and ba.  the
>>> primary key can't optimize ba because indexes only fully match if
>>> candidate fields are supplied from left to right order.  They can
>>> still help somewhat, but to a lesser degree.
>> BTW, I would like to know is it worth it to create 3rd index on map(a)
>> to reduce the size of the index which will be used by the planer
>> to save some server's RAM (obviously, at the cost of extra disk space) ?
> What Dmitriy is talking about here is that even though an index on
> (a,b) can efficiently (in terms of searching through the tree) match
> terms on just 'a', you still pay a price because the entries on the
> index have to store the data for b as well,  So even though it's
> algorithmically efficient you have to browse more data to do it which
> pressures RAM.  In other words, an index on just 'a' is ideal for
> searches on just 'a', although a,b is much better than (b,a) or no
> index at all.
>
> I personally think that generally it's better not to do that in most
> cases especially if you're indexing integer keys since you're not
> making *that* much difference on the overall index size.  Also,
> primary key indexes are much more likely to have to stay 'hot' in the
> cache anyways since they will be serving fkey reference lookups and
> stuff like that so in the end you might be consuming *more* ram, not
> less.
>
> An exception might be if your key on a,b has a very small 'a' and a
> very large 'b'.  But that's pretty rare in practice and it's usually a
> good idea to avoid indexing large fields if you can help it.  It
> really depends on the workload.
>
> merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general