Re: How reliable is XFS under Gluster?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 12/06/13 10:57, Kal Black wrote:
Hello,
I am in the point of picking up a FS for new brick nodes. I was used to
like and use ext4 until now but I recently red for an issue introduced
by a patch in ext4 that breaks the distributed translator. In the same
time, it looks like the recommended FS for a brick is no longer ext4 but
XFS which apparently will also be the default FS in the upcoming
RedHat7. On the other hand, XFS is being known as a file system that can
be easily corrupted (zeroing files) in case of a power failure.
Supporters of the file system claim that this should never happen if an
application has been properly coded (properly committing/fsync-ing data
to storage) and the storage itself has been properly configured (disk
cash disabled on individual disks and battery backed cache used on the
controllers). My question is, should I be worried about losing data in a
power failure or similar scenarios (or any) using GlusterFS and XFS? Are
there best practices for setting up a Gluster brick + XFS? Has the ext4
issue been reliably fixed? (my understanding is that this will be
impossible unless ext4 isn't being modified to allow popper work with
Gluster)

The ext4<>Gluster issue has been fixed in the latest Gluster release.

As for xfs, I have never run into truncated files in my 8-ish years of using it in production and home. With both ext3/4 and xfs, metadata is committed to a transaction log. Data is written directly. With a power outage, the filesystem is intact using the transaction log, but the file data is not.

You will always have the issue of file data corruption when a non-battery backed write cache is in use. (The kernel VFS cache is such a beast.)

So, you can have a file that is the correct length (not truncated), but be full of NULLs. I have had that. The metadata was transacted to the log, but the file data (contents) were still in the VFS write cache when I had a kernel deadlock.

Ext3/4 have options to write the file data to the transaction log, along with the metadata. So, you end up with twice the IO, as the data has to be rewritten to its proper place. This makes a smaller window for file data corruption at a power outage, at the expense of IO. You also loose the speed gain of the VFS write cache.

So, an application needs to use fsync for any data it deems required to be known written to stable storage.

Applications like sendmail, by default, use fsync for mail queue entry instantiation, before returning an OK result to the sender. You can turn it off, but that looses the transactional guarantee of SMTP.

--
Mr. Flibble
King of the Potato People
http://www.linkedin.com/in/RobertLanning
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://supercolony.gluster.org/mailman/listinfo/gluster-users




[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux