On 05/09/2010 10:20 PM, Dave Chinner wrote: > On Sun, May 09, 2010 at 08:48:00PM +0200, Rainer Fuegenstein wrote: >> >> today in the morning some daemon processes terminated because of >> errors in the xfs file system on top of a software raid5, consisting >> of 4*1.5TB WD caviar green SATA disks. > > Reminds me of a recent(-ish) md/dm readahead cancellation fix - that > would fit the symptoms of (btree corruption showing up under heavy IO > load but no corruption on disk. However, I can't seem to find any > references to it at the moment (can't remember the bug title), but > perhaps your distro doesn't have the fix in it? > > Cheers, > > Dave. That sounds plausible, as does hardware error. A memory bit flip under heavy load would cause the in memory data to be corrupt while the on disk data is good. By waiting to check it until later, the bad memory was flushed at some point and when the data was reloaded it came in ok this time. -- Doug Ledford <dledford@xxxxxxxxxx> GPG KeyID: CFBFF194 http://people.redhat.com/dledford Infiniband specific RPMs available at http://people.redhat.com/dledford/Infiniband
Attachment:
signature.asc
Description: OpenPGP digital signature