On Thu, May 1, 2008 at 5:21 PM, Dan Williams <dan.j.williams@xxxxxxxxx> wrote: > On Thu, May 1, 2008 at 2:19 PM, George Spelvin <linux@xxxxxxxxxxx> wrote: > [..] > > > But let me just ask... the RAID-5 repair code is known to work, right? > > So the situation I've got above points to some lower-level problem? > > It's not just somehow forgetting to write out the corrections and > > I'm seeing the same mismatches over and over again? > > > > Any other debugging suggestions? > > I can reproduce this here, and until I can track down what happened > the fix is reverting commit bd2ab67030e9116f1e4aae1289220255412b37fd > "md: close a livelock window in handle_parity_checks5". > > That fix was tested to close a livelock condition for which I had a > reproducible test case, but I did not regression test 'echo repair > > sync_action'. My fear was that this compromised re-adding a dirty > disk to a degraded array but that appears unaffected. > The repair operation is detecting the mismatch then correcting it in memory without writing it back. I have a potential fix that I will send in separate thread. I want to put it through more testing and get it reviewed by Neil. -- Dan -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html