On Mon, Oct 11, 2010 at 12:54 PM, Neil Whelchel <neil.whelchel@xxxxxxxxx> wrote:
2. You need a slice of the data which requires another scan to the table to
get, and using the same WHERE clause as above. This seems like a total waste,
because we just did that with the exception of actually fetching the data.
Why do it twice when if there was a way to get a slice using OFFSET and LIMIT
and get the amount of rows that matched before the OFFSET and LIMIT was
applied you could do the scan once? I think that this would improve things and
give Postgres an edge over other systems.
I'd go even farther with number 2 and suggest that a form of offset/limit which can return the total count OR have a total count be passed in to be returned the same way as if total count were being computed would make the use of that api even easier, since you could keep re-using the number returned the first time without changing the api that gets used depending upon context. Of course, you could contrive to set that up via a stored proc relatively easily by simply doing the count(*) once, then appending it to each row of the offset/limit query by including it in the select statement. Let it optionally receive the total to be used as an input parameter, which if not null will result in the count(*) block being skipped in the proc. You'd incur the full cost of the table scan plus offset/limit query once, but then not for each and every page. Since the modified api you suggest for offset/limit would surely have to perform the table scan once, that solution really isn't giving much more value than implementing as a stored proc other than the flexibility of executing an arbitrary query. Modified offset/limit combined with the count_estimate functionality would be very useful in this circumstance, though - especially if the estimate would just do a full count if the estimate is under a certain threshold. A 25% discrepancy when counting millions of rows is a lot less of an issue than a 25% discrepancy when counting 10 rows.
One issue with an estimation is that you must be certain that the estimate >= actual count or else the app must always attempt to load the page BEYOND the last page of the estimate in order to determine if the estimate must be revised upward. Otherwise, you risk leaving rows out entirely. Probably ok when returning search results. Not so much when displaying a list of assets.