Re: count(*) and bad design was: Experiences with extensibility

Ivan Sergio Borgonovo <mail@xxxxxxxxxxxxxxx> · Wed, 9 Jan 2008 17:21:24 +0100

On Wed, 09 Jan 2008 16:33:54 +0200
Sim Zacks <sim@xxxxxxxxxxxxxx> wrote:

> Using count(*) is not bad design, though generally it makes sense
> to use it with a where.

I got the impression from others comments that postgresql
under perform other DB even when a where clause on indexed column is
involved.

I may have misinterpreted this but still could someone clarify?

Harald Armin Massa wrote:

> 1) just lie, present an estimate
> 2) do not have MVCC
> 3) store "record deleted info" in index, so you can answer count()
> with only scanning the index

I got the impression that even counting with clauses on on indexed
columns means you'll have to check if columns are still there. That
seems to imply that the extra cost make pg under perform compared to
other DB even in that scenario.

I beg pardon to Harald if I misinterpreted his words.

> Saying using count(*) is bad design means that the only design that
> you can visualize is the specific one that you are using.

I'd be interested in some example by Joshua otherwise I can't
understand what he meant.

If you're interested in all the record in a table, there is no way to
have an "engraved in stone" answer and since there are no "where
clauses" you can cache that info and update it once in a while.
If you have a where clause I'm not expecting an "engraved in stone"
answer but I'd expect to have a quick way to get an estimate and I
still haven't understood if:
a) I could improve my design to avoid count
b) postgres perform as good as other db on count where there is a
where clause
c) is there a way to have a quick estimate avoiding count when there
is a where clause

> There are tons of real world examples where you need count. That is
> why so many people use it as a benchmark.

> Obviously if you have an application where millions of rows are
> added and subtracted every minute, then the value of count is kind
> of vague.

> However, if you are querying a data warehouse that gets populated
> once a month, then count has a huge value. You could use statistics
> in that case, but it is much harder to join against other tables
> when you are using statistics. It is also less intuitive then using
> the SQL standard for finding the number of rows.

Again: paging records. You can't do statistics.
Surely you could optimise and lie... but that comes to a cost
compared to the simplicity of count.

Still everybody knows that a frequent complain about postgresql is it
has a slow count.

I can understand grey answer provided they are coherent.
a) the above claim is false
b) this claim is true just on cases where you could opt for a better
design
c) this claim is false for count without where clause
d) this claim is true
e) ... 

details on b) would be much appreciated. Other cases require just a
yes/no answer.

thanks

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

               http://archives.postgresql.org/