On Wed, 1 Apr 2009, david@xxxxxxx wrote:
On Wed, 1 Apr 2009, Mark Kirkwood wrote:
Scott Carey wrote:
A little extra info here >> md, LVM, and some other tools do not allow
the
file system to use write barriers properly.... So those are on the bad
list
for data integrity with SAS or SATA write caches without battery back-up.
However, this is NOT an issue on the postgres data partition. Data fsync
still works fine, its the file system journal that might have out-of-order
writes. For xlogs, write barriers are not important, only fsync() not
lying.
As an additional note, ext4 uses checksums per block in the journal, so it
is resistant to out of order writes causing trouble. The test compared to
here was on ext4, and most likely the speed increase is partly due to
that.
[Looks at Stef's config - 2x 7200 rpm SATA RAID 0] I'm still highly
suspicious of such a system being capable of outperforming one with the
same number of (effective) - much faster - disks *plus* a dedicated WAL
disk pair... unless it is being a little loose about fsync! I'm happy to
believe ext4 is better than ext3 - but not that much!
given how _horrible_ ext3 is with fsync, I can belive it more easily with
fsync turned on than with it off.
I realized after sending this that I needed to elaborate a little more.
over the last week there has been a _huge_ thread on the linux-kernel list
(>400 messages) that is summarized on lwn.net at
http://lwn.net/SubscriberLink/326471/b7f5fedf0f7c545f/
there is a lot of information in this thread, but one big thing is that in
data=ordered mode (the default for most distros) ext3 can end up having to
write all pending data when you do a fsync on one file, In addition
reading from disk can take priority over writing the journal entry (the IO
scheduler assumes that there is someone waiting for a read, but not for a
write), so if you have one process trying to do a fsync and another
reading from the disk, the one doing the fsync needs to wait until the
disk is idle to get the fsync completed.
ext4 does things enough differently that fsyncs are relativly cheap again
(like they are on XFS, ext2, and other filesystems). the tradeoff is that
if you _don't_ do an fsync there is a increased window where you will get
data corruption if you crash.
David Lang
--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance