Bruce, On 11/22/05 4:13 PM, "Bruce Momjian" <pgman@xxxxxxxxxxxxxxxx> wrote: > Perfect summary. We have a background writer now. Ideally we would > have a background reader, that reads-ahead blocks into the buffer cache. > The problem is that while there is a relatively long time between a > buffer being dirtied and the time it must be on disk (checkpoint time), > the read-ahead time is much shorter, requiring some kind of quick > "create a thread" approach that could easily bog us down as outlined > above. Yes, the question is "how much read-ahead buffer is needed to equate to the 38% of I/O wait time in the current executor profile?" The idea of asynchronous buffering would seem appropriate if the executor would use the 38% of time as useful work. A background reader is an interesting approach - it would require admin management of buffers where AIO would leave that in the kernel. The advantage over AIO would be more universal platform support I suppose? > Right now the file system will do read-ahead for a heap scan (but not an > index scan), but even then, there is time required to get that kernel > block into the PostgreSQL shared buffers, backing up Luke's observation > of heavy memcpy() usage. As evidenced by the 16MB readahead setting still resulting in only 36% IO wait. > So what are our options? mmap()? I have no idea. Seems larger page > size does help. Not sure about that, we used to run with 32KB page size and I didn't see a benefit on seq scan at all. I haven't seen tests in this thread that compare 8K to 32K. - Luke