Re: BBU Cache vs. spindles

James Mansion <james@xxxxxxxxxxxxxxxxxxxxxx> · Thu, 28 Oct 2010 21:33:19 +0100

Tom Lane wrote:
The other and probably worse problem is that there's no application
control over how soon changes to mmap'd pages get to disk.  An msync
will flush them out, but the kernel is free to write dirty pages sooner.
So if they're depending for consistency on writes not happening until
msync, it's broken by design.  (This is one of the big reasons we don't
use mmap'd space for Postgres disk buffers.)

Well, I agree that it sucks for the reason you give - but you use write 
and that's *exactly* the
same in terms of when it gets written, as when you update a byte on an 
mmap'd page.

And you're quite happy to use write.

The only difference is that its a lot more explicit where the point of 
'maybe its written and maybe
it isn't' occurs.

There need be no real difference in the architecture for one over the 
other: there does seem to be
evidence that write and read can have better forward-read and 
write-behind behaviour, because
read/write does allow you to initiate an IO with a hint to a size that 
exceeds a hardware page.

And yes, after getting into the details while starting to port TC to 
Windows, I decided to bin
it.  Especially handy that SQLite3 has WAL now.  (And one last dig - TC 
didn't even
have a checksum that would let you tell when it had been broken: but it 
might all be fixed now
of course, I don't have time to check.)

James

--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance