Re: Frequent dwc3 crashes on suspend or reboot since 5.0-rc1

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Feb 1, 2019 at 4:31 PM Thinh Nguyen <thinh.nguyen@xxxxxxxxxxxx> wrote:
>
> Hi John,
>
> John Stultz wrote:
> > Hey all,
> >   Since the 5.0 merge window opened, I've been tripping on frequent
> > dwc3 crashes on reboot and suspend, which I've added an example to the
> > bottom of this mail.
> >
> > I've dug in a little bit and sort of have a sense of whats going on.
> >
> > In ffs_epfile_io():
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__git.kernel.org_pub_scm_linux_kernel_git_torvalds_linux.git_tree_drivers_usb_gadget_function_f-5Ffs.c-23n1065&d=DwIBaQ&c=DPL6_X_6JkXFx7AXWqB0tg&r=u9FYoxKtyhjrGFcyixFYqTjw1ZX0VsG2d8FCmzkTY-w&m=a8TU-itM8GBG_EARYf2yM-kVfCzmaPkKDNAUFQHTe3Q&s=BQiVAFiViSlxVg5_LemED0x_47FLVUD43M7R6h6T8qk&e=
> >
> > The completion done is setup on the stack:
> >   DECLARE_COMPLETION_ONSTACK(done);
> >
> > Then later we setup a request and queue it:
> >   req->context  = &done;
> >   ...
> >   ret = usb_ep_queue(ep->ep, req, GFP_ATOMIC);
> >
> > Then wait for it:
> >   if (unlikely(wait_for_completion_interruptible(&done))) {
> >     /*
> >     * To avoid race condition with ffs_epfile_io_complete,
> >     * dequeue the request first then check
> >     * status. usb_ep_dequeue API should guarantee no race
> >     * condition with req->complete callback.
> >     */
> >     usb_ep_dequeue(ep->ep, req);
> >     interrupted = ep->status < 0;
> >   }
> >
> > The problem is, that we end up being interrupted, supposedly dequeue
> > the request, and exit.
> >
> > But then (or in parallel) the irq triggers and we try calling
> > complete() on the context pointer which points to now random stack
> > space, which results in the panic.
> >
> > It seems like something is wrong with usb_ep_dequeue not really
> > stopping the irq from happening?
> >
> > If I revert all the changes to dwc3 back to 4.20, I don't see the issue.
> >
> > I'll do some bisection to try to narrow things down, but I wanted to
> > see if this was a known issue or if anyone had immediate ideas as to
> > what might be wrong.
> >
>
> I'm not sure if this is related, but can you try to test using Felipe's
> testing/next branch? There is a fix to a race condition when the gadget
> driver tries to dequeue requests.
>
> See if you run into this issue again.

I'll check that out! Thanks so much for the pointer!
-john




[Index of Archives]     [Linux Media]     [Linux Input]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Old Linux USB Devel Archive]

  Powered by Linux