Re: [md PATCH 2/2] md: only allow remove_and_add_spares when no sync_thread running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Feb 19, 2018 at 09:08:53AM +1100, Neil Brown wrote:
> On Sun, Feb 18 2018, Shaohua Li wrote:
> 
> > On Sat, Feb 03, 2018 at 09:19:30AM +1100, Neil Brown wrote:
> >> The locking protocols in md assume that a device will
> >> never be removed from an array during resync/recovery/reshape.
> >> When that isn't happening, rcu or reconfig_mutex is needed
> >> to protect an rdev pointer while taking a refcount.  When
> >> it is happening, that protection isn't needed.
> >> 
> >> Unfortunately there are cases were remove_and_add_spares() is
> >> called when recovery might be happening: is state_store(),
> >> slot_store() and hot_remove_disk().
> >> In each case, this is just an optimization, to try to expedite
> >> removal from the personality so the device can be removed from
> >> the array.  If resync etc is happening, we just have to wait
> >> for md_check_recover to find a suitable time to call
> >> remove_and_add_spares().
> >> 
> >> This optimization and not essential so it doesn't
> >> matter if it fails.
> >> So change remove_and_add_spares() to abort early if
> >> resync/recovery/reshape is happening, unless it is called
> >> from md_check_recovery() as part of a newly started recovery.
> >> The parameter "this" is only NULL when called from
> >> md_check_recovery() so when it is NULL, there is no need to abort.
> >> 
> >> As this can result in a NULL dereference, the fix is suitable
> >> for -stable.
> >> 
> >> cc: yuyufen <yuyufen@xxxxxxxxxx>
> >> Cc: Tomasz Majchrzak <tomasz.majchrzak@xxxxxxxxx>
> >> Fixes: 8430e7e0af9a ("md: disconnect device from personality before trying to remove it.")
> >> Cc: stable@xxxxxxxxxxxxxx (v4.8+)
> >> Signed-off-by: NeilBrown <neilb@xxxxxxxx>
> >> ---
> >>  drivers/md/md.c    |    4 ++++
> >>  drivers/md/raid5.c |    4 ++--
> >>  2 files changed, 6 insertions(+), 2 deletions(-)
> >> 
> >> diff --git a/drivers/md/md.c b/drivers/md/md.c
> >> index 4e4dee0ec2de..926542fbc892 100644
> >> --- a/drivers/md/md.c
> >> +++ b/drivers/md/md.c
> >> @@ -8554,6 +8554,10 @@ static int remove_and_add_spares(struct mddev *mddev,
> >>  	int removed = 0;
> >>  	bool remove_some = false;
> >>  
> >> +	if (this && test_bit(MD_RECOVERY_RUNNING, &mddev->recovery))
> >> +		/* Mustn't remove devices when resync thread is running */
> >> +		return 0;
> >> +
> >>  	rdev_for_each(rdev, mddev) {
> >>  		if ((this == NULL || rdev == this) &&
> >>  		    rdev->raid_disk >= 0 &&
> >> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
> >> index 98ce4272ace9..3fa97dad3837 100644
> >> --- a/drivers/md/raid5.c
> >> +++ b/drivers/md/raid5.c
> >> @@ -4448,12 +4448,12 @@ static void analyse_stripe(struct stripe_head *sh, struct stripe_head_state *s)
> >>  		else if (is_bad) {
> >>  			/* also not in-sync */
> >>  			if (!test_bit(WriteErrorSeen, &rdev->flags) &&
> >> -			    test_bit(R5_UPTODATE, &dev->flags)) {
> >> +			    (test_bit(R5_UPTODATE, &dev->flags) || test_bit(R5_OVERWRITE, &dev->flags))) {
> >>  				/* treat as in-sync, but with a read error
> >>  				 * which we can now try to correct
> >>  				 */
> >>  				set_bit(R5_Insync, &dev->flags);
> >> -				set_bit(R5_ReadError, &dev->flags);
> >> +				//set_bit(R5_ReadError, &dev->flags);
> >
> > Oh, what's this for?
> 
> That shouldn't have been there.
> I think that change was me experimenting with the problem that writing
> to a known bad block doesn't always mark the block as not bad and more.
> I still had that code in my tree when I made this patch, and didn't
> notice.
> 
> This patch should only have the four line addition to md.c, not the
> change to raid5.  Could you please edit it?

Ok, done.

Thanks,
Shaohua
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux