Neil Brown wrote:
On Thursday May 1, dan.j.williams@xxxxxxxxx wrote:
commit bd2ab67030e9116f1e4aae1289220255412b37fd "md: close a livelock
window in handle_parity_checks5" introduced a bug in handling 'repair'
operations. After a repair operation completes we clear the state bits
tracking this operation. However, they are cleared too early and this
results in the code deciding to re-run the parity check operation. Since
we have done the repair in memory the second check does not find a mismatch
and thus does not do a writeback.
yes....
I must admit that I find that code fairly hard to make sense of, but I
can see how it was failing before and how this fixes it, and testing
confirms that, so I suspect it is right.
I cannot help feeling that there must be some way to simplify all
those .pending and .complete bits and make it somewhat clearer, but I
haven't been able to figure out how :-(
So: Acked-by: NeilBrown <neilb@xxxxxxx>
I'm heading for a weekend, but feel free to send this to akpm.
Hmm. Should this be sent to stable- as well? I were just biten by
this very bug here, and after applying the patch and rebooting the
problem went away... 2.6.25.0 here.
/mjt
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html