Re: [PATCH] md: Fix nr_pending race during raid10 recovery

Aniket Kulkarni <aniket@xxxxxxxxxxx> · Thu, 25 Nov 2010 01:36:42 +0000 (UTC)

Neil Brown <neilb <at> suse.de> writes:
> > The fix is -
> > 
> > 1. Increment wr.nr_pending immediately after selecting a good target. 
Ofcourse
> > the decrements will be added to error paths in sync_request and 
end_sync_read.
> > 2. Don't submit recovery IOs to faulty targets
> 
> Hi again,
>  I've been thinking about this some more and cannot see that it is a real
>  problem.
>  Do you have an actual 'oops' showing a crash in this situation?
> 
>  The reason it shouldn't happen is that devices are only removed by
>  remove_and_add_devices, and that is only called when no resync/recovery is
>  happening.
>  So when a device fail, the recovery will abort (waiting for all requests to
>  complete), then failed devices are removed and possibly spares are added,
>  then possible recovery starts up again.
> 
>  So it should work correctly as it is....

Hi Neil

You are right, the 'oops' is possible only if devices can be removed during an 
active recovery.

I have a patch for that but I had forgotten to include in the original posting. 
As you have suggested, let me go back and post the patches I have as a series.

Thanks
--
aniket

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html