Re: Hardware/OS recommendations for large databases (

"Luke Lonergan" <llonergan@xxxxxxxxxxxxx> · Wed, 23 Nov 2005 09:51:06 -0800

Bruce,

On 11/22/05 4:13 PM, "Bruce Momjian" <pgman@xxxxxxxxxxxxxxxx> wrote:

> Perfect summary.  We have a background writer now.  Ideally we would
> have a background reader, that reads-ahead blocks into the buffer cache.
> The problem is that while there is a relatively long time between a
> buffer being dirtied and the time it must be on disk (checkpoint time),
> the read-ahead time is much shorter, requiring some kind of quick
> "create a thread" approach that could easily bog us down as outlined
> above.

Yes, the question is "how much read-ahead buffer is needed to equate to the
38% of I/O wait time in the current executor profile?"

The idea of asynchronous buffering would seem appropriate if the executor
would use the 38% of time as useful work.

A background reader is an interesting approach - it would require admin
management of buffers where AIO would leave that in the kernel.  The
advantage over AIO would be more universal platform support I suppose?

> Right now the file system will do read-ahead for a heap scan (but not an
> index scan), but even then, there is time required to get that kernel
> block into the PostgreSQL shared buffers, backing up Luke's observation
> of heavy memcpy() usage.

As evidenced by the 16MB readahead setting still resulting in only 36% IO
wait.

> So what are our options?  mmap()?  I have no idea.  Seems larger page
> size does help.

Not sure about that, we used to run with 32KB page size and I didn't see a
benefit on seq scan at all.  I haven't seen tests in this thread that
compare 8K to 32K. 

- Luke