On 09/18/2012 01:49 AM, Dave Chinner wrote: > On Mon, Sep 17, 2012 at 04:56:19PM +0200, Richard Ems wrote: >> Hi all, >> >> saturday morning one hard disc on our RAID6 failed. About one hour later, >> the XFS running on that device reported the following error: >> >> XFS (sdd1): Internal error xfs_da_do_buf(2) at line 2097 of file /usr/src/packages/BUILD/kernel-default-3.3.6/linux-3.3/fs/xfs/xfs_da_btree.c. > ..... >> Sep 15 07:30:51 fs1 kernel: [7369085.792619] XFS (sdd1): Corruption detected. Unmount and run xfs_repair >> >> >> And this repeating again and again ... >> >> This system has been running fine for 87 days, no power outages or such. >> It's connected to an UPS, and the H800 Raid Controller has a BBU installed. > ..... >> Why could this have happened? > > Something went wrong at the RAID level (i.e. your hardware) in > handling the disk failure and recovering the array. It corrupted > blocks in the volume rather than recovering them cleanly without > errors. The corrupted blocks happened to be in a directory block, > and a frequently accessed one according to the errors in the log. > > What you found in lost+found was the recoverable fragments of the > directory and whatever else was corrupted during the disk failure > incident. > >> What more info can I provide to understand this issue and avoid >> this to happen again? > > I'd be asking your hardware vendor about why it corrupted the > volume on a single disk failure when it is supposed to be able to > transparently handle double disk failures without losing/corrupting > data. > > Cheers, > > Dave. > Ok, many thanks Dave. I will forward this conversation to the DELL guys ... Thanks again, Richard -- Richard Ems mail: Richard.Ems@xxxxxxxxxxxxxxxxx Cape Horn Engineering S.L. C/ Dr. J.J. Dómine 1, 5º piso 46011 Valencia Tel : +34 96 3242923 / Fax 924 http://www.cape-horn-eng.com _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs