Re: [PATCH] MD: Add del_timer_sync to mddev_suspend (fix nasty panic)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 15 May 2012 23:06:14 -0500 Jonathan Brassow <jbrassow@xxxxxxxxxx>
wrote:

> Neil,
> 
> I've been seeing some really bad panics take place on dm-raid.c.  I've
> found that it is because the mddev->safemode_timer is firing after the
> mddev structure has been freed.  I've attached a patch to fix the
> problem below, but I have some questions (outlined in the patch header).
> I also have a debugging patch that prints something during each of the
> suspend stages and when md_write_end resets the timer so that you can
> see the problem in action - let me know if you want that patch also.
> 
>  brassow

Hi Jon,
 thanks for the patch.  It looks simple and can clearly fix a problem
so at this point in the cycle I propose to submit it to Linus as-is, even
though I'm not convinced it is perfect, and you didn't give me a s-o-b line.
See more blow.

> 
> Use del_timer_sync to remove timer before mddev_suspend finishes.
> 
> We don't want a timer going off after an mddev_suspend is called.  This is
> especially true with device-mapper, since it can call the destructor function
> immediately following a suspend.  This results in the removal (kfree) of the
> structures upon which the timer depends - resulting in a very ugly panic.
> Therefore, we add a del_timer_sync to mddev_suspend to prevent this.
> 
> <RFC>
> While this approach works, it seems that 'md_stop_writes' should handle this
> properly.  It calls del_timer_sync already, but then allows writes to /finish/
> after it is called.  Finishing writes call md_write_end, which resets the timer.
> Why call del_timer_sync at all there?

Because md_stop_writes shouldn't really be called until all incoming write
requests have completed.  md_stop_write is really meant to make sure that
write initiated by md itself have stopped.

However since the recent change to the reboot notifier, md does sometimes
call it while there could be outstanding writes.  In this case it set
safe_mode to 2 which ensures that the timer is never re-activated.

Maybe that is the "right" fix - set safe_mode to 2 inside md_stop_writes.

> 
> Device-mapper often makes use of the fact that mapping tables can be swapped
> for a particular device.  In these scenarios, it is not uncommon to have I/O
> flowing to a device when the following sequence of device-mapper primitives
> are issued:
> 	- presuspend
> 	- postsuspend
> 	- dtr
> Although I can't seem to find documentation on it, it is my impression that
> once the presuspend is complete, no nominal I/O must be allowed through or
> pending.  IOW, all I/O must be finished processing or queued.  This does not
> include driver issued I/O, like the updating of bitmaps - that I/O must be
> completed by the time postsuspend is complete.  (These are unwritten rules,
> I think, and the main suspend rule is the important one: No I/O must be allowed
> to flow or be pending once a suspend returns.)
> 
> dm-raid.c calls 'md_stop_writes' for the purpose of stopping nominal I/O.
> It is clear that some writes are still allowed to finish-up after this call
> is made.  Is this correct behavior?

If you want to stop normal IO, you should either just stop sending IO
requests, or call mddev_suspend which causes all new IO to be queued.

With md, md_stop is not allowed to be called until all normal IO is
complete.  It then calls md_stop_writes to stop md initiated IO.

So md needs the del_timer_sync in md_stop_writes because it never calls
mddev_suspend (except when changing levels).  dm needs it to be in
mddev_suspend as well which is fine for now, but maybe we can clean up bit
for 3.5.

> 
> dm-raid.c calls 'mddev_suspend' to ensure no I/O (nominal or driver-issued)
> is flowing.  This seems to be working as expected.  However, adding the
> del_timer_sync here allows us to clean up after any finished writes that
> occurred after 'md_stop_writes' - preventing a kernel panic from a possible
> 'dtr' call that may follow.
> </RFC>
> 
> RFC-by: Jonathan Brassow <jbrassow@xxxxxxxxxx>

<pet peeve>.
No Signed-off-by: line?  why is that.

"Signed-off-by" *only* means "I certify that I have any necessary right to
submit this patch, and I agree to it being used in the way that all other
code in this project can be used" - only with more words.  See the Developer's
Certificate of Origin 1.1

Refusing to add a Signed-off-by: because you don't think the code is "ready"
yet in some sense is just plain wrong.  I never ever want to even see a patch
that doesn't have Signed-off-by, because I don't know if I've been given
permission to use it.

Certainly add an 'RFC-by' if you want to say something about the quality of
the patch, but don't for that reason exclude the signed-off-by
</pet peeve>

Thanks,
NeilBrown


> Index: linux-upstream/drivers/md/md.c
> ===================================================================
> --- linux-upstream.orig/drivers/md/md.c
> +++ linux-upstream/drivers/md/md.c
> @@ -391,6 +391,8 @@ void mddev_suspend(struct mddev *mddev)
>  	synchronize_rcu();
>  	wait_event(mddev->sb_wait, atomic_read(&mddev->active_io) == 0);
>  	mddev->pers->quiesce(mddev, 1);
> +
> +	del_timer_sync(&mddev->safemode_timer);
>  }
>  EXPORT_SYMBOL_GPL(mddev_suspend);
>  
> 

Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux