Re: [PATCH - RFC] MD: Sync thread not properly shutdown after mddev_suspend()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 02 May 2013 15:19:23 -0500 Jonathan Brassow <jbrassow@xxxxxxxxxx>
wrote:

> MD: Sync thread not properly shutdown after mddev_suspend()
> 
> After performing an 'md_stop_writes' followed by an 'mddev_suspend',
> it is possible to have 'MD_RECOVERY_RUNNING' set in mddev->recovery.
> It doesn't happen often, but when it does, the recovery thread does
> not restart properly after a resume.
> 
> The problem seems to come from 'md_stop_writes'.  This function is a
> wrapper around '__md_stop_writes' - surrounding it with mddev_[un]lock
> calls.  While '__md_stop_writes' properly cleans up the sync thread,
> the subsequent 'mddev_unlock' call will wake up the personality thread,
> which in turn calls 'md_check_recovery' - a function that sets
> mddev->recovery flags and potentially launches the sync thread.
> Effectively, this can undo what has just been done.
> 
> When 'mddev_suspend' is called, it sets the mddev->suspended variable.
> This variable causes 'md_check_recovery' to simply return if set.  Thus,
> it is better to reap the sync thread in mddev_suspend, because it cannot
> be respawned until mddev_resume is called.
> 
> There are probably several ways to solve this problem.  The simplest way
> was to add 'md_reap_sync_thread' to mddev_suspend.  It may be
> better fixed in 'md_stop_writes' though.  We could also combine
> 'md_stop_writes' and 'mddev_suspend' by calling '__md_stop_writes' from
> within 'mddev_suspend' after mddev->suspended has been set.
> 
> Thoughts?

Thanks for the thorough analysis.

Your patch looks like it would work,  but it involves calling
md_reap_sync_thread() twice which is a little ugly.

How about this:

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 4c74424..3e2acfa 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -5277,8 +5277,8 @@ static void md_clean(struct mddev *mddev)
 
 static void __md_stop_writes(struct mddev *mddev)
 {
+	set_bit(MD_RECOVERY_FROZEN, &mddev->recovery);
 	if (mddev->sync_thread) {
-		set_bit(MD_RECOVERY_FROZEN, &mddev->recovery);
 		set_bit(MD_RECOVERY_INTR, &mddev->recovery);
 		md_reap_sync_thread(mddev);
 	}


Callers of md_stop_writes() already need to be prepared for
MD_RECOVERY_FROZEN to get set, and raid_resume() clears it for dm-raid.c, so
it should be safe.
An md_check_recovery won't start anything while MD_RECOVERY_FROZEN is set.
So this should *really* stop writes going to the devices.

Make sense?

Thanks,
NeilBrown



> 
> Signed-off-by: Jonathan Brassow <jbrassow@xxxxxxxxxx>
> 
> Index: linux-upstream/drivers/md/md.c
> ===================================================================
> --- linux-upstream.orig/drivers/md/md.c
> +++ linux-upstream/drivers/md/md.c
> @@ -360,6 +360,7 @@ void mddev_suspend(struct mddev *mddev)
>  	mddev->pers->quiesce(mddev, 1);
>  
>  	del_timer_sync(&mddev->safemode_timer);
> +	md_reap_sync_thread(mddev);
>  }
>  EXPORT_SYMBOL_GPL(mddev_suspend);
>  
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux