Re: Raid 10 chunksize

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 1 Apr 2009, david@xxxxxxx wrote:

On Wed, 1 Apr 2009, Mark Kirkwood wrote:

Scott Carey wrote:

A little extra info here >> md, LVM, and some other tools do not allow the file system to use write barriers properly.... So those are on the bad list
for data integrity with SAS or SATA write caches without battery back-up.
However, this is NOT an issue on the postgres data partition.  Data fsync
still works fine, its the file system journal that might have out-of-order
writes.  For xlogs, write barriers are not important, only fsync() not
lying.

As an additional note, ext4 uses checksums per block in the journal, so it
is resistant to out of order writes causing trouble.  The test compared to
here was on ext4, and most likely the speed increase is partly due to that.



[Looks at Stef's config - 2x 7200 rpm SATA RAID 0] I'm still highly suspicious of such a system being capable of outperforming one with the same number of (effective) - much faster - disks *plus* a dedicated WAL disk pair... unless it is being a little loose about fsync! I'm happy to believe ext4 is better than ext3 - but not that much!

given how _horrible_ ext3 is with fsync, I can belive it more easily with fsync turned on than with it off.

I realized after sending this that I needed to elaborate a little more.

over the last week there has been a _huge_ thread on the linux-kernel list (>400 messages) that is summarized on lwn.net at http://lwn.net/SubscriberLink/326471/b7f5fedf0f7c545f/

there is a lot of information in this thread, but one big thing is that in data=ordered mode (the default for most distros) ext3 can end up having to write all pending data when you do a fsync on one file, In addition reading from disk can take priority over writing the journal entry (the IO scheduler assumes that there is someone waiting for a read, but not for a write), so if you have one process trying to do a fsync and another reading from the disk, the one doing the fsync needs to wait until the disk is idle to get the fsync completed.

ext4 does things enough differently that fsyncs are relativly cheap again (like they are on XFS, ext2, and other filesystems). the tradeoff is that if you _don't_ do an fsync there is a increased window where you will get data corruption if you crash.

David Lang

--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux