It seems that the change in behavior was just a strange coincidence: The error is back after 86790 seconds of uptime! I will now run two filesystem checks and send the output when it is finished. Best regards, Reiner. > -----Original Message----- > From: linux-ide-owner@xxxxxxxxxxxxxxx [mailto:linux-ide- > owner@xxxxxxxxxxxxxxx] On Behalf Of Buehl, Reiner > Sent: Friday, May 21, 2010 4:40 PM > To: tytso@xxxxxxx; Tim Small; Dmitry Monakhov > Cc: linux-ide@xxxxxxxxxxxxxxx; linux-fsdevel@xxxxxxxxxxxxxxx > Subject: RE: ext3 filesystem corruption on md RAID1 device > > I did run a forced sync check like Tim had suggested and did not get > any errors there. After that I thought that it might be wise to > disconnect the other RAID1 arrays to prevent damage to them. And now it > gets strange: When I rebooted, I did get no EXT3-fs error messages any > more. Further investigation of the disconnected drives showed that one > of the four WD disks that is part of one of the two other, unrelated md > devices showed SMART errors. I replaced the disk and now the system is > running without any EXT3-fs error since nearly 24 hours! > > Is it possible that a faulty disk that is not part of a specific md > RAID1 device causes filesystem errors on a md RAID1 device on a > different set of disks that are connected to the same SATA > controller??? Or is this just a weird coincidence? > > Best regards, > Reiner. > > > -----Original Message----- > > From: tytso@xxxxxxx [mailto:tytso@xxxxxxx] > > Sent: Thursday, May 20, 2010 4:31 PM > > To: Buehl, Reiner > > Cc: linux-ide@xxxxxxxxxxxxxxx; linux-fsdevel@xxxxxxxxxxxxxxx > > Subject: Re: ext3 filesystem corruption on md RAID1 device > > > > On Thu, May 20, 2010 at 10:08:21AM +0000, Buehl, Reiner wrote: > > > Hi, > > > > > > I keep getting ext3 filesystem corruptions on one of my md RAID1 > > arrays. Shortly after booting, I get messages like the following one: > > > > > > EXT3-fs error (device md1): htree_dirblock_to_tree: bad entry in > > > directory #17269110: rec_len is smaller than minimal - offset=0, > > > inode=0, rec_len=0, name_len=0 > > > > This looks like a block got completely zero'ed out. One interesting > > question is whether the corruption is happening on the read side > (when > > transfering data from the disk to memory) or on the write side (when > > tranferring data from memory to disk). So something that's worth > > doing is grab the output of e2fsck, and see if it see if is trying to > > fix the directory inode reported by the EXT3-fs error syslog. > > > > Another thing that's worth doing is to try running e2fsck -fy > /dev/md1 > > a second time. If you see errors in that second fsck run, then it's > > time to suspect that either (a) the storage stack isn't reliably > > reading from disk, or (b) the storage stack isn't reliably writing to > > the disk. Thers is the possibility of an e2fsck bug, but that seems > > unlikely in this context. If you save the outputs from each e2fsck > > run, I can look at them and tell you whether it's likely an e2fsck > bug > > or, what seems more likely a storage stack failure. > > > > Regards, > > > > - Ted > -- > To unsubscribe from this list: send the line "unsubscribe linux-ide" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-ide" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html