On Fri, 1 Feb 2019, John Stultz wrote: > Hey all, > Since the 5.0 merge window opened, I've been tripping on frequent > dwc3 crashes on reboot and suspend, which I've added an example to the > bottom of this mail. > > I've dug in a little bit and sort of have a sense of whats going on. > > In ffs_epfile_io(): > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/usb/gadget/function/f_fs.c#n1065 > > The completion done is setup on the stack: > DECLARE_COMPLETION_ONSTACK(done); > > Then later we setup a request and queue it: > req->context = &done; > ... > ret = usb_ep_queue(ep->ep, req, GFP_ATOMIC); > > Then wait for it: > if (unlikely(wait_for_completion_interruptible(&done))) { > /* > * To avoid race condition with ffs_epfile_io_complete, > * dequeue the request first then check > * status. usb_ep_dequeue API should guarantee no race > * condition with req->complete callback. > */ > usb_ep_dequeue(ep->ep, req); This code contains a bug: It assumes that usb_ep_dequeue() waits until the request has been completed. You should insert wait_for_completion(&done); right here. > interrupted = ep->status < 0; > } > > The problem is, that we end up being interrupted, supposedly dequeue > the request, and exit. > > But then (or in parallel) the irq triggers and we try calling > complete() on the context pointer which points to now random stack > space, which results in the panic. This is the natural result of not waiting for the request to complete. > It seems like something is wrong with usb_ep_dequeue not really > stopping the irq from happening? Certainly. usb_ep_dequeue() just speeds up the process of completing the request; it doesn't wait for that process to finish. Alan Stern > If I revert all the changes to dwc3 back to 4.20, I don't see the issue. > > I'll do some bisection to try to narrow things down, but I wanted to > see if this was a known issue or if anyone had immediate ideas as to > what might be wrong. > > thanks > -john