Re: Slow count(*) again...

Mladen Gogala <mladen.gogala@xxxxxxxxxxx> · Sat, 09 Oct 2010 21:54:15 -0400

Neil Whelchel wrote:
I know that there haven been many discussions on the slowness of count(*) even 
when an index is involved because the visibility of the rows has to be 
checked. In the past I have seen many suggestions about using triggers and 
tables to keep track of counts and while this works fine in a situation where 
you know what the report is going to be ahead of time, this is simply not an 
option when an unknown WHERE clause is to be used (dynamically generated).
I ran into a fine example of this when I was searching this mailing list, 
"Searching in 856,646 pages took 13.48202 seconds. Site search powered by 
PostgreSQL 8.3." Obviously at some point count(*) came into play here because 
the site made a list of pages (1 2 3 4 5 6 > next). I very commonly make a 
list of pages from search results, and the biggest time killer here is the 
count(*) portion, even worse yet, I sometimes have to hit the database with 
two SELECT statements, one with OFFSET and LIMIT to get the page of results I 
need and another to get the amount of total rows so I can estimate how many 
pages of results are available. The point I am driving at here is that since 
building a list of pages of results is such a common thing to do, there need 
to be some specific high speed ways to do this in one query. Maybe an 
estimate(*) that works like count but gives an answer from the index without 
checking visibility? I am sure that this would be good enough to make a page 
list, it is really no big deal if it errors on the positive side, maybe the 
list of pages has an extra page off the end. I can live with that. What I 
can't live with is taking 13 seconds to get a page of results from 850,000 
rows in a table.
-Neil-

Unfortunately, the problem is in the rather primitive way PostgreSQL 
does I/O. It didn't change in 9.0 so there is nothing you could gain by 
upgrading. If you execute strace -o /tmp/pg.out -e read <PID of the 
sequential scan process> and inspect the file /tmp/pg.out when the query 
finishes, you will notice a gazillion of read requests, all of them 8192 
bytes in size. That means that PostgreSQL is reading the table block by 
block, without any merging of the requests. You can alleviate the pain 
by using the OS tricks, like specifying the deadline I/O scheduler in 
the grub.conf and set prefetch on the FS block devices by using 
blockdev, but there is nothing special that can be done, short of 
rewriting the way PostgreSQL does I/O. There were rumors about the 
version 9.0 and asynchronous I/O, but that didn't materialize. That is 
really strange to me, because PostgreSQL tables are files or groups of 
files, if the table size exceeds 1GB. It wouldn't be very hard to try 
reading 1MB at a time and that would speed up the full table scan 
significantly.
Problem with single block I/O is that there is a context switch for each 
request, the I/O scheduler has to work hard to merge requests 
appropriately and there is really no need for that, tables are files 
navigating through files is not a problem, even with much larger blocks.
In another database, whose name I will not mention, there is a parameter 
db_file_multiblock_read_count which specifies how many blocks will be 
read by a single read when doing a full table scan. PostgreSQL is in 
dire need of something similar and it wouldn't even be that hard to 
implement.

--
Mladen Gogala 
Sr. Oracle DBA
1500 Broadway
New York, NY 10036
(212) 329-5251
www.vmsinfo.com 

--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance