On Tue, Dec 16, 2014 at 12:34:05PM +0100, Emmanuel Florac wrote: > The RAID hardware is an adaptec 71685 running the latest firmware > ( 32033 ). This is a 16 drives RAID-6 array of 4 TB HGST drives. The > problem occurs repeatly with any combination of 7xx5 controllers and 3 > or 4 TB HGST drives in RAID-6 of various types, with XFS or JFS (it > never occurs with either ext4 or reiserfs). Do you have systems with any other type of 3/4TB drives in them? > As I mentioned, when the disk drives cache is on the corruption is > serious. With disk cache off, the corruption is minimal, however the > filesystem shuts down. That really sounds like a hardware problem - maybe with the disk drives themselves, not necessarily the controller. > The filesystem has been primed with a few (23) terabytes of mixed data > with both small (few KB or less), medium, and big (few gigabytes or > more) files. Two simultaneous, long running copies are made ( cp -a > somedir someotherdir) , while three simultaneous, long running read > operations are run ( md5sum -c mydir.md5 mydir), while the array is > busy rebuilding. Disk usage (as reported by iostat -mx 5) stays solidly > at 100%, with a continuous throughput of a few hundred megabytes per > second. The full test runs for about 12 hours (when not failing), and > ends up copying 6 TB or so, and md5summing 12 TB or so. > > > I'd start with upgrading the firmware on your RAID controller and > > turning the XFS error level up to 11.... > > The firmware is the latest available. How do I turn logging to 11 > please ? # echo 11 > /proc/sys/fs/xfs/error_level Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs