Re: Planner enhancement suggestion.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




The problem is that you're now talking about doing 2 index scans instead
of just one and a sort.

	It depends on what you call an index scan :
	a- Scanning just the index (no heap lookup) to create a bitmap
b- Scanning the index and hitting the heap in index order to retrieve the rows

(a) should be quite fast, because indexes generally use less space than the main table, and have good locality of reference. (b) is OK if the table fits in memory, but if it has to seek on every row from the heap...

	So, when doing :
	SELECT * FROM products WHERE category=C ORDER BY price LIMIT 20;

If the category contains few products, using the index on category then sorting is good. However, if the category contains many items, postgres is likely to use the index on price to avoid the sort. It needs to lose time fetching many rows from the heap which will not be in category C. In that case, I guess it would be a win to build a bitmap of the pages containing rows which belongs to category C, and only do the heap lookup on these pages.

I have a query like that. When category C contains cheap products, the index scan on price finds them pretty quick. However if it is a category containing mostly expensive products, the index scan will have to hit most of the table in order to find them. The time needed for the query for these two extreme varies from 1 ms to about 20 ms (and that's because the table is fully cached, or else the worst case would be a lot slower). I would definitely prefer a constant 2 ms. The other solution is to create an index on (category,price), but this path leads to lots, lots of indexes given all the combinations.

The bitmap trick I proposed in my previous post would be even more interesting if the table is clustered on category (which seems a reasonable thing to do).

If the correlation on price is high, it could
still win. As the cost estimator for index scan stands right now,
there's no way such a plan would be chosen unless correlation was
extremely high, however.

	Does the cost estimator know about this kind of correlation ?









[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux