Re: Slow count(*) again...

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Monday 11 October 2010 10:46:17 Craig James wrote:
> On 10/9/10 6:47 PM, Scott Marlowe wrote:
> > On Sat, Oct 9, 2010 at 5:26 PM, Neil Whelchel<neil.whelchel@xxxxxxxxx>  
wrote:
> >> I know that there haven been many discussions on the slowness of
> >> count(*) even when an index is involved because the visibility of the
> >> rows has to be checked. In the past I have seen many suggestions about
> >> using triggers and tables to keep track of counts and while this works
> >> fine in a situation where you know what the report is going to be ahead
> >> of time, this is simply not an option when an unknown WHERE clause is
> >> to be used (dynamically generated). I ran into a fine example of this
> >> when I was searching this mailing list, "Searching in 856,646 pages
> >> took 13.48202 seconds. Site search powered by PostgreSQL 8.3."
> >> Obviously at some point count(*) came into play here because the site
> >> made a list of pages (1 2 3 4 5 6>  next). I very commonly make a list
> >> of pages from search results, and the biggest time killer here is the
> >> count(*) portion, even worse yet, I sometimes have to hit the database
> >> with two SELECT statements, one with OFFSET and LIMIT to get the page
> >> of results I need and another to get the amount of total rows so I can
> >> estimate how many pages of results are available. The point I am
> >> driving at here is that since building a list of pages of results is
> >> such a common thing to do, there need to be some specific high speed
> >> ways to do this in one query. Maybe an estimate(*) that works like
> >> count but gives an answer from the index without checking visibility? I
> >> am sure that this would be good enough to make a page list, it is
> >> really no big deal if it errors on the positive side, maybe the list of
> >> pages has an extra page off the end. I can live with that. What I can't
> >> live with is taking 13 seconds to get a page of results from 850,000
> >> rows in a table.
> > 
> > 99% of the time in the situations you don't need an exact measure, and
> > assuming analyze has run recently, select rel_tuples from pg_class for
> > a given table is more than close enough.  I'm sure wrapping that in a
> > simple estimated_rows() function would be easy enough to do.
> 
> First of all, it's not true.  There are plenty of applications that need an
> exact answer.  Second, even if it is only 1%, that means it's 1% of the
> queries, not 1% of people.  Sooner or later a large fraction of developers
> will run into this.  It's probably been the most-asked question I've seen
> on this forum in the four years I've been here.  It's a real problem, and
> it needs a real solution.
> 
> I know it's a hard problem to solve, but can we stop hinting that those of
> us who have this problem are somehow being dense?
> 
> Thanks,
> Craig

That is why I suggested an estimate(*) that works like (a faster) count(*) 
except that it may be off a bit. I think that is what he was talking about 
when he wrote this.

I don't think that anyone here is trying to cast any blame, we are just 
pointing out that there is a real problem here that involves what seems to be 
a very common task, and it is placing a huge disadvantage on the use of 
Postgres to other systems that can do it in less time. There doesn't seem to 
be any disagreement that count(*) is slower than it could be due to MVCC and 
other reasons, which is fine. However at the chopping end of the line, if a 
slow count(*) makes a Postgres driven website take say a minute to render a 
web page, it is completely useless if it can be replaced with a database 
engine that can do the same thing in (much) less time. On my servers, this is 
the major sticking point. There are so many people waiting on count(*), that 
the server runs out of memory and it is forced to stop accepting more 
connections until some of the threads finish. This makes many unhappy 
customers.
When it comes to serving up web pages that contain a slice of a table with 
links to other slices, knowing about how many slices is very important. But I 
think that we can all agree that the exact amount is not a make or break (even 
better if the estimate is a bit high), so an estimate(*) function that takes 
some shortcuts here to get a much faster response (maybe off a bit) would 
solve a huge problem.

What it all boils down to is webserver response time, and there are really two 
things that are slowing things down more than what should be needed.
So there are really two possible solutions either of which would be a big 
help:
1. A faster count(*), or something like my proposed estimate(*).
2. A way to get the total rows matched when using LIMIT and OFFSET before 
LIMIT and OFFSET are applied.

If you are making a web page that contains a few results of many possible 
results, you need two things for sure which means that there are really two 
problems with Postgres for doing this task.
1. You need to know (about) how many total rows. This requires a hit to the 
database which requires a scan of the table to get, there is no way to do this 
faster than count(*) as far as I know.
2. You need a slice of the data which requires another scan to the table to 
get, and using the same WHERE clause as above. This seems like a total waste, 
because we just did that with the exception of actually fetching the data.

Why do it twice when if there was a way to get a slice using OFFSET and LIMIT 
and  get the amount of rows that matched before the OFFSET and LIMIT was 
applied you could do the scan once? I think that this would improve things and 
give Postgres an edge over other systems.

I hope this makes sense to at least one person in the right place. ;)

-Neil-

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux