Hi! >>Got any time to use >> http://www.kernel.org/pub/software/scm/git/docs/git-bisect.html >>to determine which patch fixed it? > > Yep, I'm currently working on it. But it will take some time. > I'm currently at the 2nd bisection, so expect my result in 3 or four days. OK, the good news is: git-bisect worked and returned a patch. The bad news is: it's a patch for reiserfs, fixing a race condition. You can find it here: http://www.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=d62b1b87a7d1c3a21dddabed4251763090be3182 This means either: a) A similar bug appears in other filesystems as well, since we could reproduce it with ext3 and xfs. This seems pretty unlikely to me. or: b) There are two bugs with similar symptoms. The one we are looking for is either b1) fixed in the 2.6.15+ kernels I used for git-bisect and I was chasing the wrong bug. or: b2) not fixed in those kernels but significantly harder (or maybe not at al) to reproduce on my test setup than the reiserfs bug (for whatever reason). In this case it never had a chance to wreck my filesystem, because the reiserfs bug did that first. To sum it up, this is what may have happened: 1) I experience a bug which after some investigation seems to be related to dm-crypt over raid5. It happens with reiserfs, ext3 and xfs. 2) I set up a test system, using reiserfs and verify, that the bug still occurs. 3) The symptoms of the bug occur but are caused by a different bug in the filesystem code. I don't know that and think the test setup is suitable for analyzing the bug. 4) I use a newer kernel and the symptoms disappear. I think some patch in the new kernel adresses the bug I'm looking for, but the bug that really is fixed is the one in the reiserfs code. 5) I try to find the patch fixing "my" bug, but instead I find a patch fixing the reiserfs bug. 6) I know how git-bisect works. (pretty cool tool btw) At least a positive outcome ;) What next? Maybe: 1) Try to reproduce "our" bug with that reiserfs patch applied to a kernel that is known to corrupt ext3/xfs filesystems, to find out whether we are dealing with two bugs or not. 2) If it really is two bugs try to find out if "ours" is fixed in the newer kernels and use git-bisect again (either with that reiserfs patch applied or an other filesystem) to determine what fixed it. If there is only the reiserfs bug look into the other filesystems, and check if they are subject to similar race conditions and fix those, if not already done. Did anybody here test the newest kernel with a filesystem other than reiserfs? Kevin
Attachment:
signature.asc
Description: OpenPGP digital signature