Re: [md PATCH 2/2] md: only allow remove_and_add_spares when no sync_thread running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, Feb 03, 2018 at 09:19:30AM +1100, Neil Brown wrote:
> The locking protocols in md assume that a device will
> never be removed from an array during resync/recovery/reshape.
> When that isn't happening, rcu or reconfig_mutex is needed
> to protect an rdev pointer while taking a refcount.  When
> it is happening, that protection isn't needed.
> 
> Unfortunately there are cases were remove_and_add_spares() is
> called when recovery might be happening: is state_store(),
> slot_store() and hot_remove_disk().
> In each case, this is just an optimization, to try to expedite
> removal from the personality so the device can be removed from
> the array.  If resync etc is happening, we just have to wait
> for md_check_recover to find a suitable time to call
> remove_and_add_spares().
> 
> This optimization and not essential so it doesn't
> matter if it fails.
> So change remove_and_add_spares() to abort early if
> resync/recovery/reshape is happening, unless it is called
> from md_check_recovery() as part of a newly started recovery.
> The parameter "this" is only NULL when called from
> md_check_recovery() so when it is NULL, there is no need to abort.
> 
> As this can result in a NULL dereference, the fix is suitable
> for -stable.
> 
> cc: yuyufen <yuyufen@xxxxxxxxxx>
> Cc: Tomasz Majchrzak <tomasz.majchrzak@xxxxxxxxx>
> Fixes: 8430e7e0af9a ("md: disconnect device from personality before trying to remove it.")
> Cc: stable@xxxxxxxxxxxxxx (v4.8+)
> Signed-off-by: NeilBrown <neilb@xxxxxxxx>
> ---
>  drivers/md/md.c    |    4 ++++
>  drivers/md/raid5.c |    4 ++--
>  2 files changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index 4e4dee0ec2de..926542fbc892 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -8554,6 +8554,10 @@ static int remove_and_add_spares(struct mddev *mddev,
>  	int removed = 0;
>  	bool remove_some = false;
>  
> +	if (this && test_bit(MD_RECOVERY_RUNNING, &mddev->recovery))
> +		/* Mustn't remove devices when resync thread is running */
> +		return 0;
> +
>  	rdev_for_each(rdev, mddev) {
>  		if ((this == NULL || rdev == this) &&
>  		    rdev->raid_disk >= 0 &&
> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
> index 98ce4272ace9..3fa97dad3837 100644
> --- a/drivers/md/raid5.c
> +++ b/drivers/md/raid5.c
> @@ -4448,12 +4448,12 @@ static void analyse_stripe(struct stripe_head *sh, struct stripe_head_state *s)
>  		else if (is_bad) {
>  			/* also not in-sync */
>  			if (!test_bit(WriteErrorSeen, &rdev->flags) &&
> -			    test_bit(R5_UPTODATE, &dev->flags)) {
> +			    (test_bit(R5_UPTODATE, &dev->flags) || test_bit(R5_OVERWRITE, &dev->flags))) {
>  				/* treat as in-sync, but with a read error
>  				 * which we can now try to correct
>  				 */
>  				set_bit(R5_Insync, &dev->flags);
> -				set_bit(R5_ReadError, &dev->flags);
> +				//set_bit(R5_ReadError, &dev->flags);

Oh, what's this for?

>  			}
>  		} else if (test_bit(In_sync, &rdev->flags))
>  			set_bit(R5_Insync, &dev->flags);
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux