On 11/10/10 11:14, Mladen Gogala wrote: > On 10/10/2010 8:27 PM, Joshua Tolley wrote: >> It was asserted that reading bigger chunks would help performance; a >> response >> suggested that, at least in Linux, setting readahead on a device would >> essentially do the same thing. Or that's what I got from the thread, >> anyway. >> I'm interested to know how similar performance might be between the large >> block size case and the large readahead case. Comments, anyone? >> > > Craig maybe right, the fact that Oracle is doing direct I/O probably > does account for the difference. The fact is, however, that the question > about slow sequential scan appears with some regularity on PostgreSQL > forums. My guess that a larger chunk would be helpful may not be > correct, but I do believe that there is a problem with a too slow > sequential scan. Bigger chunks are a very traditional solution which > may not work but the problem is still there. Now that, I agree with. BTW, I casually looked into async I/O a little, and it seems the general situation for async I/O on Linux is "pretty bloody awful". POSIX async I/O uses signal-driven completion handlers - but signal queue depth limits mean they aren't necessarily reliable, so you can get lost completions and have to scan the event buffer periodically to catch them. The alternative is running completion handlers in threads, but apparently there are queue depth limit issues there too, as well as the joy that is getting POSIX threading right. I think there was some talk about this on -HACKERS a while ago. Here's the main discussion on async I/O I've found: http://archives.postgresql.org/pgsql-hackers/2006-10/msg00820.php ... from which it seems that async buffered I/O is poorly supported, if at all, on current Linux kernels. Don't know about the BSDs. As Pg is *really* poorly suited to direct I/O, relying on the OS buffer cache as it does, unbuffered direct I/O isn't really an option. Linux async I/O seems to be designed for network I/O and for monitoring lots of files for changes, rather than for highly concurrent I/O on one or a few files. It shows. Re slow seqscans, there is still plenty of room to move: - Sequential scans cannot (AFAIK) use the visibility map introduced in 8.4 to skip sections of tables that are known to contain only dead tuples not visible to any transaction or free space. This potential optimization could make a big difference in tables with FILLFACTOR or with holes created by certain update patterns. - Covering indexes ("index oriented" table columns) and/or indexes with embedded visibility information could dramatically improve the performance of certain queries by eliminating the need to hit the heap at all, albeit at the cost of trade-offs elsewhere. This would be particularly useful for those classic count() queries. There have been discussions about these on -hackers, but I'm not up with the current status or lack thereof. - There's been recent talk of using pread() rather than lseek() and read() to save on syscall overhead. The difference is probably minimal, but it'd be nice. It is worth being aware of a few other factors: - Sometimes seqscans are actually the fastest option, and people don't realize this, so they try to force index use where it doesn't make sense. This is the cause of a significant number of list complaints. - Slow sequential scans are often a consequence of table bloat. It's worth checking for this. Pg's autovacuum and manual vacuum have improved in performance and usability dramatically over time, but still have room to move. Sometimes people disable autovacuum in the name of a short-lived performance boost, not realizing it'll have horrible effects on performance in the mid- to long- term. - Seqscans can be chosen when index scans are more appropriate if the random_page_cost and seq_page_cost aren't set sensibly, which they usually aren't. This doesn't make seqscans any faster, but it's even worse when you have a good index you're not using. I can't help but wonder if a bundled "quick and dirty benchmark" tool for Pg would be beneficial in helping to determine appropriate values for these settings and for effective io concurrency. -- Craig Ringer Tech-related writing: http://soapyfrogs.blogspot.com/ -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance