Re: musb RPM sleep-while-atomic in 4.9-rc1

Johan Hovold <johan@xxxxxxxxxx> · Thu, 27 Oct 2016 20:45:07 +0200

On Thu, Oct 27, 2016 at 10:40:17AM -0700, Tony Lindgren wrote:
> * Johan Hovold <johan@xxxxxxxxxx> [161027 09:45]:
> > On Thu, Oct 27, 2016 at 08:14:46AM -0700, Tony Lindgren wrote:
> > > * Tony Lindgren <tony@xxxxxxxxxxx> [161026 07:32]:
> > 
> > > 8< -------------------------------
> > > From tony Mon Sep 17 00:00:00 2001
> > > From: Tony Lindgren <tony@xxxxxxxxxxx>
> > > Date: Tue, 25 Oct 2016 08:42:00 -0700
> > > Subject: [PATCH] usb: musb: Fix sleeping function called from invalid
> > >  context for hdrc glue
> > > 
> > > Commit 65b3f50ed6fa ("usb: musb: Add PM runtime support for MUSB DSPS
> > > glue layer") wrongly added a call for pm_runtime_get_sync to otg_timer
> > > that runs in softirq context. That causes a "BUG: sleeping function called
> > > from invalid context" every time when polling the cable status:
> > > 
> > > [<c015ebb4>] (__might_sleep) from [<c0413d60>] (__pm_runtime_resume+0x9c/0xa0)
> > > [<c0413d60>] (__pm_runtime_resume) from [<c04d0bc4>] (otg_timer+0x3c/0x254)
> > > [<c04d0bc4>] (otg_timer) from [<c0191180>] (call_timer_fn+0xfc/0x41c)
> > > [<c0191180>] (call_timer_fn) from [<c01915c0>] (expire_timers+0x120/0x210)
> > > [<c01915c0>] (expire_timers) from [<c0191acc>] (run_timer_softirq+0xa4/0xdc)
> > > [<c0191acc>] (run_timer_softirq) from [<c010168c>] (__do_softirq+0x12c/0x594)
> > > 
> > > I did not notice that as I did not have CONFIG_DEBUG_ATOMIC_SLEEP enabled.
> > > 
> > > Let's fix the issue by adding dsps_check_status() and then register a
> > > callback with musb_runtime_resume() so it gets called only when musb core
> > > and it's parent devices are awake. Note that we don't want to do this from
> > > PM runtime resume in musb_dsps.c as musb core is not awake yet at that
> > > point as noted by Johan Hovold <johan@xxxxxxxxxx>.
> > 
> > It seems some chunks are missing since this patch only runs the
> > deferred work at remove and not at runtime resume, effectively breaking
> > detection.
> 
> Oops sorry yeah looks like I had that in a separate debug hacks patch..
> 
> > > Note that musb_gadget_queue() also suffers from a similar issue when
> > > connecting the cable and cannot use pm_runtime_get_sync().
> > 
> > You seem to have left that pm_runtime_get_sync() in there though.
> 
> And that one I must have hosed when cleaning up, thanks for noticing
> these. Updated patch below.

I had a couple of inline comments to the previous version about locking
in the gadget code as well (hidden after too much context). Looks like
there's a lock missing for the deferred work, and something that seems
like a possible ABBA deadlock.

> diff --git a/drivers/usb/musb/musb_gadget.c b/drivers/usb/musb/musb_gadget.c
> --- a/drivers/usb/musb/musb_gadget.c
> +++ b/drivers/usb/musb/musb_gadget.c
> @@ -1222,6 +1222,13 @@ void musb_ep_restart(struct musb *musb, struct musb_request *req)
>  		rxstate(musb, req);
>  }
>  
> +void musb_ep_restart_resume_work(struct musb *musb, void *data)
> +{
> +	struct musb_request *req = data;
> +
> +	musb_ep_restart(musb, req);

This one is supposed to be called with musb->lock held (according to the
function header anyway).

> +}
> +
>  static int musb_gadget_queue(struct usb_ep *ep, struct usb_request *req,
>  			gfp_t gfp_flags)
>  {
> @@ -1255,7 +1262,7 @@ static int musb_gadget_queue(struct usb_ep *ep, struct usb_request *req,
>  
>  	map_dma_buffer(request, musb, musb_ep);
>  
> -	pm_runtime_get_sync(musb->controller);
> +	pm_runtime_get(musb->controller);
>  	spin_lock_irqsave(&musb->lock, lockflags);
>  
>  	/* don't queue if the ep is down */
> @@ -1271,8 +1278,13 @@ static int musb_gadget_queue(struct usb_ep *ep, struct usb_request *req,
>  	list_add_tail(&request->list, &musb_ep->req_list);
>  
>  	/* it this is the head of the queue, start i/o ... */
> -	if (!musb_ep->busy && &request->list == musb_ep->req_list.next)
> -		musb_ep_restart(musb, request);
> +	if (!musb_ep->busy && &request->list == musb_ep->req_list.next) {
> +		if (pm_runtime_active(musb->controller))
> +			musb_ep_restart(musb, request);
> +		else
> +			musb_queue_on_resume(musb, musb_ep_restart_resume_work,
> +					     request);
> +	}

But then this looks like it could trigger an ABBA deadlock as musb->lock
is held while queue_on_resume() takes musb->list_lock, and
musb_run_pending() would take the same locks in the reverse order.

>  
>  unlock:
>  	spin_unlock_irqrestore(&musb->lock, lockflags);

Johan
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html