Re: [md PATCH 1/2] md: always clear ->safemode when md_check_recovery gets the mddev lock.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Aug 08, 2017 at 04:56:36PM +1000, Neil Brown wrote:
> If ->safemode == 1, md_check_recovery() will try to get the mddev lock
> and perform various other checks.
> If mddev->in_sync is zero, it will call set_in_sync, and clear
> ->safemode.  However if mddev->in_sync is not zero, ->safemode will not
> be cleared.
> 
> When md_check_recovery() drops the mddev lock, the thread is woken
> up again.  Normally it would just check if there was anything else to
> do, find nothing, and go to sleep.  However as ->safemode was not
> cleared, it will take the mddev lock again, then wake itself up
> when unlocking.
> 
> This results in an infinite loop, repeatedly calling
> md_check_recovery(), which RCU or the soft-lockup detector
> will eventually complain about.
> 
> Prior to commit 4ad23a976413 ("MD: use per-cpu counter for
> writes_pending"), safemode would only be set to one when the
> writes_pending counter reached zero, and would be cleared again
> when writes_pending is incremented.  Since that patch, safemode
> is set more freely, but is not reliably cleared.
> 
> So in md_check_recovery() clear ->safemode before checking ->in_sync.

Nice catch! Applied both patches.

I spent hours to check why md_check_recovery loops, apparently I missed
set_in_sync is only called when in_sync is not set, silly me.

Thanks,
Shaohua
 
> Fixes: 4ad23a976413 ("MD: use per-cpu counter for writes_pending")
> Cc: stable@xxxxxxxxxxxxxxx (4.12+)
> Reported-by: Dominik Brodowski <linux@xxxxxxxxxxxxxxxxxxxx>
> Reported-by: David R <david@xxxxxxxxxxxxxxx>
> Signed-off-by: NeilBrown <neilb@xxxxxxxx>
> ---
>  drivers/md/md.c |    3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index c99634612fc4..d84aceede1cb 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -8656,6 +8656,9 @@ void md_check_recovery(struct mddev *mddev)
>  	if (mddev_trylock(mddev)) {
>  		int spares = 0;
>  
> +		if (mddev->safemode == 1)
> +			mddev->safemode = 0;
> +
>  		if (mddev->ro) {
>  			struct md_rdev *rdev;
>  			if (!mddev->external && mddev->in_sync)
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux