Tom Lane wrote:
The other and probably worse problem is that there's no application
control over how soon changes to mmap'd pages get to disk. An msync
will flush them out, but the kernel is free to write dirty pages sooner.
So if they're depending for consistency on writes not happening until
msync, it's broken by design. (This is one of the big reasons we don't
use mmap'd space for Postgres disk buffers.)
Well, I agree that it sucks for the reason you give - but you use write
and that's *exactly* the
same in terms of when it gets written, as when you update a byte on an
mmap'd page.
And you're quite happy to use write.
The only difference is that its a lot more explicit where the point of
'maybe its written and maybe
it isn't' occurs.
There need be no real difference in the architecture for one over the
other: there does seem to be
evidence that write and read can have better forward-read and
write-behind behaviour, because
read/write does allow you to initiate an IO with a hint to a size that
exceeds a hardware page.
And yes, after getting into the details while starting to port TC to
Windows, I decided to bin
it. Especially handy that SQLite3 has WAL now. (And one last dig - TC
didn't even
have a checksum that would let you tell when it had been broken: but it
might all be fixed now
of course, I don't have time to check.)
James
--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance