Re: Slow count(*) again...

Craig James <craig_james@xxxxxxxxxxxxxx> · Mon, 11 Oct 2010 10:46:17 -0700

On 10/9/10 6:47 PM, Scott Marlowe wrote:
On Sat, Oct 9, 2010 at 5:26 PM, Neil Whelchel<neil.whelchel@xxxxxxxxx>  wrote:
I know that there haven been many discussions on the slowness of count(*) even
when an index is involved because the visibility of the rows has to be
checked. In the past I have seen many suggestions about using triggers and
tables to keep track of counts and while this works fine in a situation where
you know what the report is going to be ahead of time, this is simply not an
option when an unknown WHERE clause is to be used (dynamically generated).
I ran into a fine example of this when I was searching this mailing list,
"Searching in 856,646 pages took 13.48202 seconds. Site search powered by
PostgreSQL 8.3." Obviously at some point count(*) came into play here because
the site made a list of pages (1 2 3 4 5 6>  next). I very commonly make a
list of pages from search results, and the biggest time killer here is the
count(*) portion, even worse yet, I sometimes have to hit the database with
two SELECT statements, one with OFFSET and LIMIT to get the page of results I
need and another to get the amount of total rows so I can estimate how many
pages of results are available. The point I am driving at here is that since
building a list of pages of results is such a common thing to do, there need
to be some specific high speed ways to do this in one query. Maybe an
estimate(*) that works like count but gives an answer from the index without
checking visibility? I am sure that this would be good enough to make a page
list, it is really no big deal if it errors on the positive side, maybe the
list of pages has an extra page off the end. I can live with that. What I
can't live with is taking 13 seconds to get a page of results from 850,000
rows in a table.

99% of the time in the situations you don't need an exact measure, and
assuming analyze has run recently, select rel_tuples from pg_class for
a given table is more than close enough.  I'm sure wrapping that in a
simple estimated_rows() function would be easy enough to do.

First of all, it's not true.  There are plenty of applications that need an exact answer.  Second, even if it is only 1%, that means it's 1% of the queries, not 1% of people.  Sooner or later a large fraction of developers will run into this.  It's probably been the most-asked question I've seen on this forum in the four years I've been here.  It's a real problem, and it needs a real solution.

I know it's a hard problem to solve, but can we stop hinting that those of us who have this problem are somehow being dense?

Thanks,
Craig

--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance