On Tue, Jan 18, 2005 at 11:03:22PM -0300, Alvaro Herrera wrote: > On Tue, Jan 18, 2005 at 07:33:51PM -0600, Jim C. Nasby wrote: > > On Wed, Jan 19, 2005 at 02:15:42AM +0100, Florian G. Pflug wrote: > > > You can, howevery, accelerate something like "where f in (1,2,3,4)". You > > > just scan the index 4 times, each time for a different value. Of course, > > > if the number of values becomes larger and larger, there is a point > > > where it's more efficient to do a sequential scan _once_, instead of a > > > few tousand index scans (depends on the number of rows in the table). > > > The postgres optimizer tries to estimate this, and will switch to an > > > seq-scan, if it would have to do too many index lookups. > > > > Are PostgreSQL Btree indexes setup as a linked-list so you can scan > > forwards and backwards in them? > > Yes, they are. > > > If so, is the IN processor smart enough to collapse ranges of values > > into a single index scan > > No, it isn't AFAIK. > > > (ie, IN(1,2,3,4,8,9,10) would best be done as an index scan starting > > at 1 and stoping at >4 and a second scan starting at 8 and stopping at > > >10). > > That assumes the optimizer knows that the domain can contain integer > values ... seems a complex and infructuous analysis to do in general. > > Maybe the optimizer could collapse that to a single index scan from 1 to > 10 and then apply a filter to extract only the values actually mentioned. > Not sure how workable that is. It seems like it should just be a SMOC (simple matter of code), though I suspect better index statistics might be needed first. Given good enough statistics, it should be easy to ascertain how many index tuples you can expect between two values (even if it's not an int column, in this case). The optimizer should also be able to know what the cost is to scan down the index structure, so it should then be easy to figure out which plan is better. Speaking of index statistics, has anyone looked at ways to determine how many tuples exist between two different values? For example, Sybase used to keep statistics in 256 buckets. It would define the range of values that each bucket covered, and how many tuples in the table fell into that bucket. It's a very elegant way to deal with tables that have skewed numbers, as well as clusters of different numbers. -- Jim C. Nasby, Database Consultant decibel@xxxxxxxxxxx Give your computer some brain candy! www.distributed.net Team #1828 Windows: "Where do you want to go today?" Linux: "Where do you want to go tomorrow?" FreeBSD: "Are you guys coming, or what?" ---------------------------(end of broadcast)--------------------------- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to majordomo@xxxxxxxxxxxxxx)