Re: [PATCH 2/4] usb: musb: Fix sleeping function called from invalid context for hdrc glue

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Nov 09, 2016 at 10:54:38AM -0700, Tony Lindgren wrote:
> * Johan Hovold <johan@xxxxxxxxxx> [161109 08:40]:
> > On Tue, Nov 08, 2016 at 06:26:07PM -0700, Tony Lindgren wrote:
> > > * Johan Hovold <johan@xxxxxxxxxx> [161108 12:03]:
> > > > On Tue, Nov 08, 2016 at 10:34:13AM -0700, Tony Lindgren wrote:
> > > > > * Johan Hovold <johan@xxxxxxxxxx> [161108 10:09]:

> > > > > > In fact, the dsps timer must also be cancelled on suspend, or you could
> > > > > > end up calling dsps_check_status() while suspended (it is currently not
> > > > > > cancelled until the parent device is suspended, which could be too
> > > > > > late).
> > > > > 
> > > > > And then this should no longer be an issue either.
> > > > 
> > > > It would still be an issue as a system-suspending device could already
> > > > have been runtime-resumed so that dsps_check_status() would be called
> > > > directly from the timer function.
> > > 
> > > The glue layers should do pm_runtime_get_sync(musb->controller) which
> > > dsps glue already does. So that's the musb_core.c device instance. And
> > > looks like we have dsps_suspend() call del_timer_sync(&glue->timer)
> > > already. I think we're safe here.
> > 
> > But the point is that the controller might be RPM_ACTIVE if the
> > controller was already runtime resumed when it is system suspended.
> > 
> > Since this (and the previous) patch run the work directly from the timer
> > callback if active, it could end up accessing the controller after it
> > has been system suspended. Specifically, stopping the timer in the glue
> > (parent) suspend callback is too late to avoid this.
> > 
> > 	pm_runtime_get_sync(musb->controller);
> > 		musb_runtime_resume()
> > 			musb_restore_context();
> > 	
> > 	...
> > 
> > 	musb_suspend()
> > 		musb_save_context();
> > 
> > 	otg_timer()
> > 		pm_runtime_get();
> > 		if (pm_runtime_active(musb->controller))
> > 			dsps_check_status();
> > 		pm_runtime_put_autosuspend();
> > 
> > 	dsps_suspend()
> > 		del_timer_sync();
> > 
> 
> OK so we need to return without doing anything from otg_timer() on
> pm_runtime_get() to avoid that.

I'm afraid that won't work as pm_runtime_get() would still succeed (i.e.
even after musb_suspend()).

See 6f3c77b040fc ("PM / Runtime: let rpm_resume() succeed if RPM_ACTIVE,
even when disabled, v2").

> In the long run it would be nice to make whatever optional state polling
> musb generic with just a glue layer callback.

Yes, and make sure to stop polling in musb_suspend(). Would it be
possible to use the enable and disable ops for this until then?

> > > +/*
> > > + * Called to run work if device is active or else queue the work to happen
> > > + * on resume. Caller must take musb->lock.
> > 
> > Caller must also hold an RPM reference.
> 
> Good point, will add.
> 
> > > +	if (musb->is_runtime_suspended) {
> > > +		list_add_tail(&w->node, &musb->pending_list);
> > > +		error = 0;
> > > +	} else {
> > > +		dev_err(musb->controller, "could not add resume work %p\n",
> > > +			callback);
> > > +		devm_kfree(musb->controller, w);
> > > +		error = -EINPROGRESS;
> > 
> > But this means you should be able to run the callback below, right? It
> > has to be run from somewhere so otherwise the caller needs to retry
> > instead.
> 
> Well there's no longer need to run the callback at that point any longer
> and with that removed that should not be an issue.
> 
> Anyways, musb->is_runtime_suspended is needed to protect anything from
> being queued between runtime resume having called musb_run_resume_work()
> and before pm_runtime_active() is true. At that point the caller could
> just wait for pm_runtime_active() to be set and run the code. But based
> on my tests that does not happen and queueing is faster than getting to
> the pm_runtime_active() state so we just print errors in those cases if
> we ever hit it later on.

But for a generic solution, this race could still be an issue. What
about musb_gadget_queue() for example? Would not failing to start I/O
if racing with resume be a problem?

> Sounds like the rest of your comments are no longer an issue, please
> let me know if that's not the case.

I think the barrier comment for the WARN_ON on musb_runtime_suspend()
still applies.

> 8< ---------------------------
> From tony Mon Sep 17 00:00:00 2001
> From: Tony Lindgren <tony@xxxxxxxxxxx>
> Date: Wed, 2 Nov 2016 19:59:05 -0700
> Subject: [PATCH] usb: musb: Fix sleeping function called from invalid
>  context for hdrc glue
> 
> Commit 65b3f50ed6fa ("usb: musb: Add PM runtime support for MUSB DSPS
> glue layer") wrongly added a call for pm_runtime_get_sync to otg_timer
> that runs in softirq context. That causes a "BUG: sleeping function called
> from invalid context" every time when polling the cable status:
> 
> [<c015ebb4>] (__might_sleep) from [<c0413d60>] (__pm_runtime_resume+0x9c/0xa0)
> [<c0413d60>] (__pm_runtime_resume) from [<c04d0bc4>] (otg_timer+0x3c/0x254)
> [<c04d0bc4>] (otg_timer) from [<c0191180>] (call_timer_fn+0xfc/0x41c)
> [<c0191180>] (call_timer_fn) from [<c01915c0>] (expire_timers+0x120/0x210)
> [<c01915c0>] (expire_timers) from [<c0191acc>] (run_timer_softirq+0xa4/0xdc)
> [<c0191acc>] (run_timer_softirq) from [<c010168c>] (__do_softirq+0x12c/0x594)
> 
> I did not notice that as I did not have CONFIG_DEBUG_ATOMIC_SLEEP enabled.
> And looks like also musb_gadget_queue() suffers from the same problem.
> 
> Let's fix the issue by using a list of delayed work then call it on
> resume. Note that we want to do this only when musb core and it's
> parent devices are awake as noted by Johan Hovold <johan@xxxxxxxxxx>.
> 
> Later on we may be able to remove other delayed work in the musb driver
> and just do it from pending_resume_work. But this should be done only
> for delayed work that does not have other timing requirements beyond
> just being run on resume.
> 
> Fixes: 65b3f50ed6fa ("usb: musb: Add PM runtime support for MUSB DSPS
> glue layer")
> Reported-by: Johan Hovold <johan@xxxxxxxxxx>
> Signed-off-by: Tony Lindgren <tony@xxxxxxxxxxx>
> ---
>  drivers/usb/musb/musb_core.c   | 112 +++++++++++++++++++++++++++++++++++++++--
>  drivers/usb/musb/musb_core.h   |   7 +++
>  drivers/usb/musb/musb_dsps.c   |  34 +++++++++----
>  drivers/usb/musb/musb_gadget.c |  21 ++++++--
>  4 files changed, 157 insertions(+), 17 deletions(-)
 
> @@ -2622,14 +2714,18 @@ static int musb_runtime_suspend(struct device *dev)
>  {
>  	struct musb	*musb = dev_to_musb(dev);
>  
> +	WARN_ON(!list_empty(&musb->pending_list));
>  	musb_save_context(musb);
> +	musb->is_runtime_suspended = 1;

Isn't the compiler free to reorder the store to is_runtime_suspended
here so that the WARN_ON could trigger when queuing is racing with
runtime suspend?

Removing the WARN_ON, or moving it under list_lock while updating
is_runtime_suspended should do.

>  	return 0;
>  }

> +static void otg_timer(unsigned long _musb)
> +{
> +	struct musb *musb = (void *)_musb;
> +	struct device *dev = musb->controller;
> +	unsigned long flags;
> +	int err;
> +
> +	err = pm_runtime_get(dev);
> +	if (err < 0) {
> +		dev_err(dev, "Poll could not pm_runtime_get: %i\n", err);

So as mentioned above, this would not be sufficient to detect that dev
has been (system) suspended.

> +
> +		return;
> +	}
> +
> +	spin_lock_irqsave(&musb->lock, flags);
> +	err = musb_queue_resume_work(musb, dsps_check_status, NULL);
> +	if (err < 0)
> +		dev_err(dev, "%s resume work: %i\n", __func__, err);
> +	spin_unlock_irqrestore(&musb->lock, flags);
>  	pm_runtime_mark_last_busy(dev);
>  	pm_runtime_put_autosuspend(dev);
>  }
> diff --git a/drivers/usb/musb/musb_gadget.c b/drivers/usb/musb/musb_gadget.c
> --- a/drivers/usb/musb/musb_gadget.c
> +++ b/drivers/usb/musb/musb_gadget.c
> @@ -1222,6 +1222,15 @@ void musb_ep_restart(struct musb *musb, struct musb_request *req)
>  		rxstate(musb, req);
>  }
>  
> +static int musb_ep_restart_resume_work(struct musb *musb, void *data)
> +{
> +	struct musb_request *req = data;
> +
> +	musb_ep_restart(musb, req);
> +
> +	return 0;
> +}
> +
>  static int musb_gadget_queue(struct usb_ep *ep, struct usb_request *req,
>  			gfp_t gfp_flags)
>  {
> @@ -1255,7 +1264,7 @@ static int musb_gadget_queue(struct usb_ep *ep, struct usb_request *req,
>  
>  	map_dma_buffer(request, musb, musb_ep);
>  
> -	pm_runtime_get_sync(musb->controller);
> +	pm_runtime_get(musb->controller);

Add the missing error handling here too?

Johan
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Arm (vger)]     [ARM Kernel]     [ARM MSM]     [Linux Tegra]     [Linux WPAN Networking]     [Linux Wireless Networking]     [Maemo Users]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux