Re: Frequent dwc3 crashes on suspend or reboot since 5.0-rc1

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi John,

John Stultz wrote:
> On Fri, Feb 1, 2019 at 4:18 PM John Stultz <john.stultz@xxxxxxxxxx> wrote:
>> Hey all,
>>   Since the 5.0 merge window opened, I've been tripping on frequent
>> dwc3 crashes on reboot and suspend, which I've added an example to the
>> bottom of this mail.
>>
>> I've dug in a little bit and sort of have a sense of whats going on.
>>
>> In ffs_epfile_io():
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__git.kernel.org_pub_scm_linux_kernel_git_torvalds_linux.git_tree_drivers_usb_gadget_function_f-5Ffs.c-23n1065&d=DwIBaQ&c=DPL6_X_6JkXFx7AXWqB0tg&r=u9FYoxKtyhjrGFcyixFYqTjw1ZX0VsG2d8FCmzkTY-w&m=Ikgcuoe1TJkip3EVA2Cce33perU7WerY9a24BCFW4DM&s=3gJjzpAGPdj79ROPvlM1ziRTY-4u6VRFRwKWbz5X_SA&e=
>>
>> The completion done is setup on the stack:
>>   DECLARE_COMPLETION_ONSTACK(done);
>>
>> Then later we setup a request and queue it:
>>   req->context  = &done;
>>   ...
>>   ret = usb_ep_queue(ep->ep, req, GFP_ATOMIC);
>>
>> Then wait for it:
>>   if (unlikely(wait_for_completion_interruptible(&done))) {
>>     /*
>>     * To avoid race condition with ffs_epfile_io_complete,
>>     * dequeue the request first then check
>>     * status. usb_ep_dequeue API should guarantee no race
>>     * condition with req->complete callback.
>>     */
>>     usb_ep_dequeue(ep->ep, req);
>>     interrupted = ep->status < 0;
>>   }
>>
>> The problem is, that we end up being interrupted, supposedly dequeue
>> the request, and exit.
>>
>> But then (or in parallel) the irq triggers and we try calling
>> complete() on the context pointer which points to now random stack
>> space, which results in the panic.
>>
>> It seems like something is wrong with usb_ep_dequeue not really
>> stopping the irq from happening?
>>
>> If I revert all the changes to dwc3 back to 4.20, I don't see the issue.
>>
>> I'll do some bisection to try to narrow things down, but I wanted to
>> see if this was a known issue or if anyone had immediate ideas as to
>> what might be wrong.
> Bisecting the changes down, it seems like its due to commit
> fec9095bdef4e ("usb: dwc3: gadget: remove wait_end_transfer").
>
> It doesn't happen all the time, so I'll need to run some more testing,
> but so far I've not been able to trigger it backing out the patches to
> that point.
>
> thanks
> -john
>

Yeah, it sounds like the same issue. You can review the discussion here:
https://www.spinics.net/lists/linux-usb/msg176110.html

Thinh




[Index of Archives]     [Linux Media]     [Linux Input]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Old Linux USB Devel Archive]

  Powered by Linux