Re: Slow count(*) again...

Craig Ringer <craig@xxxxxxxxxxxxxxxxxxxxx> · Mon, 11 Oct 2010 12:11:58 +0800

On 11/10/10 11:14, Mladen Gogala wrote:
>  On 10/10/2010 8:27 PM, Joshua Tolley wrote:
>> It was asserted that reading bigger chunks would help performance; a
>> response
>> suggested that, at least in Linux, setting readahead on a device would
>> essentially do the same thing. Or that's what I got from the thread,
>> anyway.
>> I'm interested to know how similar performance might be between the large
>> block size case and the large readahead case. Comments, anyone?
>>
> 
> Craig maybe right, the fact that Oracle is doing direct I/O probably
> does account for the difference. The fact is, however, that the question
> about slow sequential scan appears with some regularity on PostgreSQL
> forums. My guess that a larger chunk would be helpful may not be
> correct, but I do believe that there is a problem with a too slow
> sequential scan.  Bigger chunks are a very traditional solution which
> may not work but the problem is still there.

Now that, I agree with.

BTW, I casually looked into async I/O a little, and it seems the general
situation for async I/O on Linux is "pretty bloody awful". POSIX async
I/O uses signal-driven completion handlers - but signal queue depth
limits mean they aren't necessarily reliable, so you can get lost
completions and have to scan the event buffer periodically to catch
them. The alternative is running completion handlers in threads, but
apparently there are queue depth limit issues there too, as well as the
joy that is getting POSIX threading right. I think there was some talk
about this on -HACKERS a while ago. Here's the main discussion on async
I/O I've found:

http://archives.postgresql.org/pgsql-hackers/2006-10/msg00820.php

... from which it seems that async buffered I/O is poorly supported, if
at all, on current Linux kernels. Don't know about the BSDs. As Pg is
*really* poorly suited to direct I/O, relying on the OS buffer cache as
it does, unbuffered direct I/O isn't really an option.

Linux async I/O seems to be designed for network I/O and for monitoring
lots of files for changes, rather than for highly concurrent I/O on one
or a few files. It shows.

Re slow seqscans, there is still plenty of room to move:

- Sequential scans cannot (AFAIK) use the visibility map introduced in
8.4 to skip sections of tables that are known to contain only dead
tuples not visible to any transaction or free space. This potential
optimization could make a big difference in tables with FILLFACTOR or
with holes created by certain update patterns.

- Covering indexes ("index oriented" table columns) and/or indexes with
embedded visibility information could dramatically improve the
performance of certain queries by eliminating the need to hit the heap
at all, albeit at the cost of trade-offs elsewhere. This would be
particularly useful for those classic count() queries. There have been
discussions about these on -hackers, but I'm not up with the current
status or lack thereof.

- There's been recent talk of using pread() rather than lseek() and
read() to save on syscall overhead. The difference is probably minimal,
but it'd be nice.

It is worth being aware of a few other factors:

- Sometimes seqscans are actually the fastest option, and people don't
realize this, so they try to force index use where it doesn't make
sense. This is the cause of a significant number of list complaints.

- Slow sequential scans are often a consequence of table bloat. It's
worth checking for this. Pg's autovacuum and manual vacuum have improved
in performance and usability dramatically over time, but still have room
to move. Sometimes people disable autovacuum in the name of a
short-lived performance boost, not realizing it'll have horrible effects
on performance in the mid- to long- term.

- Seqscans can be chosen when index scans are more appropriate if the
random_page_cost and seq_page_cost aren't set sensibly, which they
usually aren't. This doesn't make seqscans any faster, but it's even
worse when you have a good index you're not using. I can't help but
wonder if a bundled "quick and dirty benchmark" tool for Pg would be
beneficial in helping to determine appropriate values for these settings
and for effective io concurrency.

-- 
Craig Ringer

Tech-related writing: http://soapyfrogs.blogspot.com/

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance