Multicolumn index - WHERE ... ORDER BY

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi there,

I've got a small question about multicolumn indexes.

I have a table with ~5M rows (43 bytes per column - is that relevant?) (but eventually it could grow up to 50M rows), used to store e-mail logs. I am trying to build a web frontend to search mails in this table.

I usually want only the last mails processed by my mail system, so typically all my queries end with:
... ORDER BY time LIMIT 50;

Before that, I have usually have a WHERE clause on a indexed column. Example of a query I might have:

SELECT id FROM mail WHERE from_address LIKE 'bill%'
ORDER BY time DESC LIMIT 50;

I observed that the ordering adds a significant overhead to my queries - this seems quite logical, because of the ORDER BY which has to inspect every row matching the WHERE clause.

The approach taken by the query planner is one of the following:

1) if it thinks there are "not so much" rows containg 'bill' as prefix of the 'from_address' column, it performs an index scan (or a bitmap index scan) using my index on 'from_address', then sorts all results according to the 'time' column.

2) if it thinks there are "many" rows containing 'bill' as prefix of the 'from_address' column, it performs an reverse index scan using my index on 'time', and looks "sequentially" if the 'from_address' column contains 'bill' as prefix.

The problem is that "not so much" is in my case approx 10K rows sometimes. It seems to be pretty costly to perform an (bitmap) index scan over all these rows. As I only want the first few rows anyway (LIMIT 50), I thought that there had to be some better solution.

The solution I had in mind was to create a multicolumn index over 'from_address' and 'time':

CREATE INDEX idx_from_time ON mail (from_address, time DESC);

so that it could directly use the 'time' ordering and lookup only the first 50 rows using the index.

but... it doesn't work :-) i.e. my multicolumn index is never used. So:
- do you guys have any ideas why it doesn't work?
- do you see an alternative solution?

Infos:
- I use PostgreSQL 8.4.2
- I regularly VACUUM and ANALYZE my db. Statistics look OK.
- I'm relatively new to PostgreSQL, so maybe this question is trivial?

Thanks in advance, and happy holidays!

--
lucas maystre
trainee

open systems ag
raeffelstrasse 29
ch-8045 zurich
t: +41 44 455 74 00
f: +41 44 455 74 01
lum@xxxxxxx

http://www.open.ch

--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux