>>>> We highly doubt it's hardware failures with this frequency in mind, so >>>> we're wondering regarding to this issue if there's some ext3 bug-fix >>>> having merged into mainline but not in our old kernel? >>> >>> Absolutely there are. There have been 87 changes just to namei.c since 2.6.16. >>> You could look through git logs to see if anything looks applicable. >>> >>> You might try: >>> >>> ef2b02d3e617cb0400eedf2668f86215e1b0e6af ext34: ensure do_split leaves enough free space in both blocks >> >> I've been asked to investigate this issue. Thanks for the reply! >> >> I found this fix while searching for similar bug reports, but I don't think it >> worths trying as we don't use dir_index feature. >> >> I've collected some logs in different machines, and the error was always >> triggered in ext3_readdir: >> >> EXT3-fs error (device sda7): ext3_readdir: bad entry in directory #6685458: rec_len is smaller than minimal - offset=3860, inode=0, rec_len=0, name_len=0 >> EXT3-fs error (device sda7): ext3_readdir: bad entry in directory #9650541: rec_len is smaller than minimal - offset=3960, inode=0, rec_len=0, name_len=0 >> EXT3-fs error (device sda7): ext3_readdir: bad entry in directory #11124783: rec_len is smaller than minimal - offset=4072, inode=0, rec_len=0, name_len=0 >> EXT3-fs error (device sda7): ext3_readdir: bad entry in directory #52740880: rec_len is smaller than minimal - offset=4024, inode=0, rec_len=0, name_len=0 >> EXT3-fs error (device sda7): ext3_readdir: bad entry in directory #52740880: rec_len is smaller than minimal - offset=4084, inode=0, rec_len=0, name_len=0 >> >> The last two errors happened on the same machine, and the same inode! One >> happened in 11/22 (I was told they had run fsck later on), and one in 12/01. > So now this directory has been fscked to be right? You can try by just right. > ls this directory and check whether there are any errors in dmesg. > no error at all. > Having said that, as this error happens 2 times for the same inode, > maybe there is a kernel bug. At least as Ted said in another mail, the > end of this buffer head seems to be cleared. So I guess next time when > you see this error, please do: > 1. use debugfs to find the disk layout for this dir > 2. read the blocks from the block device directly > 3. check whether the end of a block(from offset to the end) is zeroed. > 4. If yes, I guess there should be a kernel bug and we can go on to > investigate the code. > This may give us different output with that by dumping dir via debugfs? If so I'll try next time. Seeing from the output dumpped via debugfs in one machine, more than harf of the dir block is all zero, but the offset is near 4K. I also checked several other machines, no difference. Regards Li Zefan -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html