On 05/17/2011 05:47 AM, Craig Ringer wrote:
This makes me wonder if Pg attempts to pre-fetch blocks of interest for areas where I/O needs can be known in advance, while there's still other works or other I/O to do. For example, pre-fetching for the next iteration of a nested loop while still executing the prior one. Is it even possible?
Well, remember that a nested loop isn't directly doing any I/O. It's pulling rows from some lower level query node. So the useful question to ask is "how can pre-fetch speed up the table access methods?" That worked out like this:
Sequential Scan: logic here was added and measured as useful for one system with terrible I/O. Everywhere else it was tried on Linux, the read-ahead logic in the kernel seems to make this redundant. Punted as too much complexity relative to measured average gain. You can try to tweak this on a per-file database in an application, but the kernel has almost as much information to make that decision usefully as the database does.
Index Scan: It's hard to know what you're going to need in advance here and pipeline the reads, so this hasn't really been explored yet.
Bitmap heap scan: Here, the exact list of blocks to fetch is known in advance, they're random, and it's quite possible for the kernel to schedule them more efficiently than serial access of them can do. This was added as the effective_io_concurrency feature (it's the only thing that feature impacts), which so far is only proven to work on Linux. Any OS implementing the POSIX API used will also get this however; FreeBSD was the next likely candidate that might benefit when I last looked around.
I'm guessing not, because (AFAIK) Pg uses only synchronous blocking I/O, and with that there isn't really a way to pre-fetch w/o threads or helper processes. Linux (at least) supports buffered async I/O, so it'd be possible to submit such prefetch requests ... on modern Linux kernels. Portably doing so, though - not so much.
Linux supports the POSIX_FADV_WILLNEED advisory call, which is perfect for suggesting what blocks will be accessed in the near future in the bitmap heap scan case. That's how effective_io_concurrency works.
Both Solaris and Linux also have async I/O mechanisms that could be used instead. Greg Stark built a prototype and there's an obvious speed-up there to be had. But the APIs for this aren't very standard, and it's really hard to rearchitect the PostgreSQL buffer manager to operate in a less synchronous way. Hoping that more kernels support the "will need" API usefully, which meshes very well with how PostgreSQL thinks about the problem, is where things are at right now. With so many bigger PostgreSQL sites on Linux, that's worked out well so far.
-- Greg Smith 2ndQuadrant US greg@xxxxxxxxxxxxxxx Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us "PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance