On Tue, Mar 07, 2006 at 07:09:15PM +0100, PFC wrote: > > >The problem is that you're now talking about doing 2 index scans instead > >of just one and a sort. > > It depends on what you call an index scan : > a- Scanning just the index (no heap lookup) to create a bitmap Sure, and then you scan the other index and read the heap at the same time (b). Your plan requires doing both. The question is: in what cases will it be faster to scan the extra index and build the bitmap vs. just doing a sort. > b- Scanning the index and hitting the heap in index order to > retrieve the rows > > (a) should be quite fast, because indexes generally use less space > than the main table, and have good locality of reference. (b) is OK if the > table fits in memory, but if it has to seek on every row from the heap... If the table fits in memory, who cares? A sort should be damn fast at that point, because you're dealing with a small set of data. > So, when doing : > SELECT * FROM products WHERE category=C ORDER BY price LIMIT 20; > > If the category contains few products, using the index on category > then sorting is good. > However, if the category contains many items, postgres is likely to > use the index on price to avoid the sort. It needs to lose time fetching Have you actually seen this behavior? My experience is that you have to have a correlation somewhere over 80-90% before an index scan is favored over a seqscan + sort (which as I mentioned before appears to be broken). > The bitmap trick I proposed in my previous post would be even more > interesting if the table is clustered on category (which seems a > reasonable thing to do). In which case it's highly unlikely that using the price index will buy you anything. > Does the cost estimator know about this kind of correlation ? Yes. The problem is that the index scan cost estimator figures out a best and worst case cost, and then interpolates between the two using correlation^2. IMO it should be using abs(correlation) to do this, and there's some data at http://stats.distributed.net/~decibel/ that backs this up. There's also been some discussions on -hackers (search the archives for "index cost correlation nasby"), but I've not had time to follow up on this. If you wanted to test a new index cost formula it would be a one line change to the code. -- Jim C. Nasby, Sr. Engineering Consultant jnasby@xxxxxxxxxxxxx Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461