XFS and MD RAID

bfoster at redhat.com (Brian Foster) · Wed, 29 Aug 2012 08:47:22 -0400

On 08/29/2012 03:48 AM, Brian Candler wrote:
> Does anyone have any experience running gluster with XFS and MD RAID as the
> backend, and/or LSI HBAs, especially bad experience?
> 

We have a few servers with 12 drive LSI RAID controllers we use for
gluster (running XFS on RHEL6.2). I don't recall seeing major issues,
but to be fair these particular systems see more hacking/dev/unit test
work than longevity or stress testing. We also are not using MD in any
way (hardware RAID).

I'd be happy to throw a similar workload at one of them if you can
describe your configuration in a bit more detail: specific MD
configuration (RAID type, chunk size, etc.), XFS format options and
mount options, anything else that might be in the I/O stack (LVM?),
specific bonnie++ test you're running (a single instance? or some kind
of looping test?).

> In a test setup (Ubuntu 12.04, gluster 3.3.0, 24 x SATA HD on LSI Megaraid
> controllers, MD RAID) I can cause XFS corruption just by throwing some
> bonnie++ load at the array - locally without gluster.  This happens within
> hours.  The same test run over a week doesn't corrupt with ext4.
> 
> I've just been bitten by this in production too on a gluster brick I hadn't
> converted to ext4.  I have the details I can post separately if you wish,
> but the main symptoms were XFS timeout errors and stack traces in dmesg, and
> xfs corruption (requiring a reboot and xfs_repair showing lots of errors,
> almost certainly some data loss).
> 

Could you collect the generic data and post it to linux-xfs? Somebody
might be able to read further into the problem via the stack traces. It
also might be worth testing an upstream kernel on your server, if possible.

Brian

> However, this leaves me with some unpalatable conclusions and I'm not sure
> where to go from here.
> 
> (1) XFS is a shonky filesystem, at least in the version supplied in Ubuntu
> kernels.  This seems unlikely given its pedigree and the fact that it is
> heavily endorsed by Red Hat for their storage appliance.
> 
> (2) Heavy write load in XFS is tickling a bug lower down in the stack
> (either MD RAID or LSI mpt2sas driver/firmware), but heavy write load in
> ext4 doesn't.  This would have to be a gross error such as blocks queued for
> write being thrown away without being sent to the drive.
> 
> I guess this is plausible - perhaps the usage pattern of write barriers is
> different for example.  However I don't want to point the finger there
> without direct evidence either.  There are no block I/O error events logged
> in dmesg.
> 
> The only way I can think of pinning this down is to find out what's the
> smallest MD RAID array I can reproduce the problem with, then try to build a
> new system with a different controller card (as MD RAID + JBOD, and/or as a
> hardware RAID array)
> 
> However while I try to see what I can do for that, I would be grateful for
> any other experience people have in this area.
> 
> Many thanks,
> 
> Brian.
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
>