Re: count(*) and bad design was: Experiences with extensibility

Martijn van Oosterhout <kleptog@xxxxxxxxx> · Thu, 10 Jan 2008 11:57:58 +0100

On Wed, Jan 09, 2008 at 03:28:04PM +0100, Ivan Sergio Borgonovo wrote:
> Let me consider an everyday use where count() looks as the most
> obvious solution: paging.
> 
> I search trough a table and I need to know which is the last page.

There's an often overlooked solution to this. Let's say your count
returns 100,000 records, are you going to give them link to 1000
different pages? Not really. Probably about 10, so really your count
only is interested in an exact result less than 100, or that's it's
more than 100.

By placing the where clause in a subselect with a limit of 101 and a
count() around it you have an upper bound on the cost of the count, the
result 101 simply means "more than 100". Depending on the data you
might just put the limit on the query that fetches the data and using
everything after the 10th record to determine your count on the client
side and skip the extra round trip.

> Is there a way to count based on indexes without taking into account
> deleted rows so to "count" faster?

I've also often pondered whether the SQL standard support for table
sampling would be good here. Sure, you still need to check visibility,
but if you specify that the DB only needs to check 10% of the tuples
and to extrapolate the results from that, you could get a fast yet
reasonably accurate result. IIRC patches for this have been floated on
the lists.

Have a nice day,
-- 
Martijn van Oosterhout   <kleptog@xxxxxxxxx>   http://svana.org/kleptog/
> Those who make peaceful revolution impossible will make violent revolution inevitable.
>  -- John F Kennedy
Attachment:
signature.asc

Description: Digital signature