On Apr 14, 2008, at 3:31 PM, Tom Lane wrote:
Gregory Stark <stark@xxxxxxxxxxxxxxxx> writes:The transition domain where performance drops dramatically as the database starts to not fit in shared buffers but does still fit in filesystem cache.It looks to me like the knee comes where the DB no longer fits in filesystem cache. What's interesting is that there seems to be no synergy at all between shared_buffers and the filesystem cache.Ideally, very hot pages would stay in shared buffers and drop out of thekernel cache, allowing you to use a database approximating all-of-RAM before you hit the performance wall. It's clear that in this example that's not happening, or at least that only a small part of shared buffers isn't getting duplicated in filesystem cache.
I suspect that we're getting double-buffering on everything because every time we dirty a buffer and write it out the OS is considering that as access, and keeping that data in it's cache. It would be interesting to try an overcome that and see how it impacts things. With our improvement in checkpoint handling, we might be able to just write via DIO... if not maybe there's some way to tell the OS to buffer the write for us, but target that data for removal from cache as soon as it's written.
Of course, that's because pgbench reads a randomly-chosen row of "accounts" in each transaction, so that there's exactly zero locality of access. A more realistic workload would probably have a Zipfian distribution of account number touches, and might look a little better on this type of test.
-- Decibel!, aka Jim C. Nasby, Database Architect decibel@xxxxxxxxxxx Give your computer some brain candy! www.distributed.net Team #1828
Attachment:
smime.p7s
Description: S/MIME cryptographic signature