Re: [md PATCH 2/2] md: only allow remove_and_add_spares when no sync_thread running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, Feb 18 2018, Shaohua Li wrote:

> On Sat, Feb 03, 2018 at 09:19:30AM +1100, Neil Brown wrote:
>> The locking protocols in md assume that a device will
>> never be removed from an array during resync/recovery/reshape.
>> When that isn't happening, rcu or reconfig_mutex is needed
>> to protect an rdev pointer while taking a refcount.  When
>> it is happening, that protection isn't needed.
>> 
>> Unfortunately there are cases were remove_and_add_spares() is
>> called when recovery might be happening: is state_store(),
>> slot_store() and hot_remove_disk().
>> In each case, this is just an optimization, to try to expedite
>> removal from the personality so the device can be removed from
>> the array.  If resync etc is happening, we just have to wait
>> for md_check_recover to find a suitable time to call
>> remove_and_add_spares().
>> 
>> This optimization and not essential so it doesn't
>> matter if it fails.
>> So change remove_and_add_spares() to abort early if
>> resync/recovery/reshape is happening, unless it is called
>> from md_check_recovery() as part of a newly started recovery.
>> The parameter "this" is only NULL when called from
>> md_check_recovery() so when it is NULL, there is no need to abort.
>> 
>> As this can result in a NULL dereference, the fix is suitable
>> for -stable.
>> 
>> cc: yuyufen <yuyufen@xxxxxxxxxx>
>> Cc: Tomasz Majchrzak <tomasz.majchrzak@xxxxxxxxx>
>> Fixes: 8430e7e0af9a ("md: disconnect device from personality before trying to remove it.")
>> Cc: stable@xxxxxxxxxxxxxx (v4.8+)
>> Signed-off-by: NeilBrown <neilb@xxxxxxxx>
>> ---
>>  drivers/md/md.c    |    4 ++++
>>  drivers/md/raid5.c |    4 ++--
>>  2 files changed, 6 insertions(+), 2 deletions(-)
>> 
>> diff --git a/drivers/md/md.c b/drivers/md/md.c
>> index 4e4dee0ec2de..926542fbc892 100644
>> --- a/drivers/md/md.c
>> +++ b/drivers/md/md.c
>> @@ -8554,6 +8554,10 @@ static int remove_and_add_spares(struct mddev *mddev,
>>  	int removed = 0;
>>  	bool remove_some = false;
>>  
>> +	if (this && test_bit(MD_RECOVERY_RUNNING, &mddev->recovery))
>> +		/* Mustn't remove devices when resync thread is running */
>> +		return 0;
>> +
>>  	rdev_for_each(rdev, mddev) {
>>  		if ((this == NULL || rdev == this) &&
>>  		    rdev->raid_disk >= 0 &&
>> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
>> index 98ce4272ace9..3fa97dad3837 100644
>> --- a/drivers/md/raid5.c
>> +++ b/drivers/md/raid5.c
>> @@ -4448,12 +4448,12 @@ static void analyse_stripe(struct stripe_head *sh, struct stripe_head_state *s)
>>  		else if (is_bad) {
>>  			/* also not in-sync */
>>  			if (!test_bit(WriteErrorSeen, &rdev->flags) &&
>> -			    test_bit(R5_UPTODATE, &dev->flags)) {
>> +			    (test_bit(R5_UPTODATE, &dev->flags) || test_bit(R5_OVERWRITE, &dev->flags))) {
>>  				/* treat as in-sync, but with a read error
>>  				 * which we can now try to correct
>>  				 */
>>  				set_bit(R5_Insync, &dev->flags);
>> -				set_bit(R5_ReadError, &dev->flags);
>> +				//set_bit(R5_ReadError, &dev->flags);
>
> Oh, what's this for?

That shouldn't have been there.
I think that change was me experimenting with the problem that writing
to a known bad block doesn't always mark the block as not bad and more.
I still had that code in my tree when I made this patch, and didn't
notice.

This patch should only have the four line addition to md.c, not the
change to raid5.  Could you please edit it?

Thanks,
NeilBrown


>
>>  			}
>>  		} else if (test_bit(In_sync, &rdev->flags))
>>  			set_bit(R5_Insync, &dev->flags);
>> 
>> 

Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux