Re: Re: dm-crypt is broken and causes massive data corruption

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi!

>>Got any time to use
>>  http://www.kernel.org/pub/software/scm/git/docs/git-bisect.html
>>to determine which patch fixed it?
> 
> Yep, I'm currently working on it. But it will take some time.
> I'm currently at the 2nd bisection, so expect my result in 3 or four days.

OK, the good news is: git-bisect worked and returned a patch.
The bad news is: it's a patch for reiserfs, fixing a race condition.
You can find it here:
http://www.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=d62b1b87a7d1c3a21dddabed4251763090be3182


This means either:

a) A similar bug appears in other filesystems as well, since we could
reproduce it with ext3 and xfs. This seems pretty unlikely to me.

or:

b) There are two bugs with similar symptoms. The one we are looking for
is either

b1) fixed in the 2.6.15+ kernels I used for git-bisect and I was
chasing the wrong bug.

or:

b2) not fixed in those kernels but significantly harder (or maybe not at
al) to reproduce on my test setup than the reiserfs bug (for whatever
reason). In this case it never had a chance to wreck my filesystem,
because the reiserfs bug did that first.


To sum it up, this is what may have happened:

1) I experience a bug which after some investigation seems to be related
to dm-crypt over raid5. It happens with reiserfs, ext3 and xfs.

2) I set up a test system, using reiserfs and verify, that the bug still
occurs.

3) The symptoms of the bug occur but are caused by a different bug in
the filesystem code. I don't know that and think the test setup is
suitable for analyzing the bug.

4) I use a newer kernel and the symptoms disappear. I think some patch
in the new kernel adresses the bug I'm looking for, but the bug that
really is fixed is the one in the reiserfs code.

5) I try to find the patch fixing "my" bug, but instead I find a patch
fixing the reiserfs bug.

6) I know how git-bisect works. (pretty cool tool btw)

At least a positive outcome ;)


What next? Maybe:

1) Try to reproduce "our" bug with that reiserfs patch applied to a
kernel that is known to corrupt ext3/xfs filesystems, to find out
whether we are dealing with two bugs or not.

2) If it really is two bugs try to find out if "ours" is fixed in the
newer kernels and use git-bisect again (either with that reiserfs patch
applied or an other filesystem) to determine what fixed it.
If there is only the reiserfs bug look into the other filesystems, and
check if they are subject to similar race conditions and fix those, if
not already done.

Did anybody here test the newest kernel with a filesystem other than
reiserfs?

Kevin

Attachment: signature.asc
Description: OpenPGP digital signature


[Index of Archives]     [Device Mapper Devel]     [Fedora Desktop]     [ATA RAID]     [Fedora Marketing]     [Fedora Packaging]     [Fedora SELinux]     [Yosemite News]     [KDE Users]     [Fedora Tools]     [Fedora Docs]

  Powered by Linux