Hi, On Thu, 11 Feb 2010 00:46:31 +0100, Sebastian Reichelt wrote: > Hi, > > thanks for your really quick reply. > > > The BUG tells that nilfs met a corrupted block during lookup of a > > btree. > > > > Can you confirm which version of 2.6.31 kernel you were using? > > The problem was, I had compiled the kernel on the same partition > that is corrupted now. Anyway, by modifying the line in btree.c, I > was able to recover most files, including the kernel > tarball. (Though something like 1000 files are inaccessible.) I > still cannot figure out the exact release number, though, because I > can't find any place in the kernel source where it is written > down. It was a kernel from the Debian testing distribution, > downloaded on October 24. All files in the tarball are dated > September 10. Does that help? September 10 is the release date of 2.6.31 mainline kernel. Hmm.. I guess your corruption is related to the missing bug-fixes. Could you try the latest debian kernel (i.e. 2.6.32-trunk) or the latest stable kernels ? > I had actually seen the post on www.nilfs.org about a file system > corruption fix in nilfs 2.0.17, but when I downloaded the Linux > source from Debian 3 weeks later, I thought for sure they would > include the fix. I didn't know Debian testing was this far behind on > critical bugfixes. Linus merged the fixes soon and they were also sent to the stable kernel team at the same time. I heard that gentoo took the fixes relatively quickly. I don't know the Debian case, but it might take longer time. > > > (Unfortunately, it is executed even if I use the "ro" option on the > > > Linux command line -- why?!) It happens during a sys_open call. If > > > the entire stack trace helps, I could post that, too. > > > > Sounds weird.. > > I still don't understand why nilfs_cleanerd was started even if the > root fs was mounted read-only, but with the patched nilfs, it became > clear that a library required by nilfs_cleanerd was among the files > that were inaccessible, and that's why crashed right away. If I > mount the file system later instead (as a non-root FS), the mount > operation actually completes; the kernel just crashes when I try to > access some files (with an unpatched/older nilfs such as the one in > the current Ubuntu release). Thanks for telling me the details. For the first issue, is there a possibility that the partition was remounted read/write? Rw-remount also invokes cleanerd. Though mount.nilfs2 only runs it after the remount operation succeeded.. > > Unfortunately, the disk image is rarely-helpful from my experiences > > since it's hard to track the cause from a corrupted state. > > OK, I sort of expected that it wouldn't help. Then again, since > there is no practical way of reproducing the bug, I thought the end > result might be the only thing one can analyze to find the > bug. Well, I'm glad I'm not a file system developer. :-) Since nilfs GC moves disk blocks, this makes such analysis harder than regular filesystems. Fortunately, the recent nilfs builds are pretty stable, and this makes things easier than before ;) > Best regards, > Sebastian Reichelt With regards, Ryusuke Konishi -- To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html