Re: BUG with linux 5.9.0 with dwc3 in gadget mode

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Ferry Toth wrote:
> Op 21-10-2020 om 21:50 schreef Thinh Nguyen:
>> Ferry Toth wrote:
>>> Op 21-10-2020 om 00:58 schreef Thinh Nguyen:
>>>> Thinh Nguyen wrote:
>>>>> Hi,
>>>>>
>>>>> Ferry Toth wrote:
>>>>>> Op 20-10-2020 om 14:32 schreef Felipe Balbi:
>>>>>>> Hi,
>>>>>>>
>>>>>>> Ferry Toth <fntoth@xxxxxxxxx> writes:
>>>>>>>
>>>>>>> 8< snip
>>>>>>>
>>>>>>>>>> [   12.657416] CR2: 0000000100000000
>>>>>>>>>> [   12.660729] ---[ end trace 9b92dea6da33c71e ]---
>>>>>>>>> It this something you can reproduce on your end? Ferry, can
>>>>>>>>> you get
>>>>>>>>> dwc3
>>>>>>>>> trace logs when this happens? ftrace_dump_on_oops may help here.
>>>>>>>> I will do that tonight. Is flipping on ftrace_dump_on_oops
>>>>>>>> sufficient or
>>>>>>>> do I need to do more?
>>>>>>> you'd have to enable dwc3 trace events first ;-)
>>>>>>>
>>>>>>>> BTW after posting this I found in host mode dwc3 is not working
>>>>>>>> properly
>>>>>>>> either. No oops, but no driver get loaded on device plug in.
>>>>>>> okay
>>>>>>>
>>>>>> Ehem, you maybe only me to enable /dwc3/dwc3_ep_dequeue/enable:
>>>>>>
>>>>>> root@edison:/boot# uname -a
>>>>>> Linux edison 5.9.0-edison-acpi-standard #1 SMP Mon Oct 19
>>>>>> 20:17:04 UTC
>>>>>> 2020 x86_64 x86_64 x86_64 GNU/Linux
>>>>>> root@edison:/boot# echo 1 >
>>>>>> /sys/kernel/debug/tracing/events/dwc3/dwc3_ep_dequeue/enable
>>>>>> root@edison:/boot# echo 1 > /proc/sys/kernel/ftrace_dump_on_oops
>>>>>> root@edison:/boot#
>>>>>> root@edison:/boot# [ 2608.585323] BUG: kernel NULL pointer
>>>>>> dereference, address: 0000000000000000
>>>>>> [ 2608.592288] #PF: supervisor read access in kernel mode
>>>>>> [ 2608.597419] #PF: error_code(0x0000) - not-present page
>>>>>> [ 2608.602549] PGD 0 P4D 0
>>>>>> [ 2608.605090] Oops: 0000 [#1] SMP PTI
>>>>>> [ 2608.608580] CPU: 1 PID: 733 Comm: irq/15-dwc3 Not tainted
>>>>>> 5.9.0-edison-acpi-standard #1
>>>>>> [ 2608.616571] Hardware name: Intel Corporation Merrifield/BODEGA
>>>>>> BAY,
>>>>>> BIOS 542 2015.01.21:18.19.48
>>>>>> [ 2608.625356] RIP: 0010:dwc3_gadget_ep_dequeue+0x41/0x1c0
>>>>>> [ 2608.630580] Code: e9 51 01 00 00 4c 8d a3 30 01 00 00 4c 89 e7 e8
>>>>>> 15 e6 42 00 49 8b 4e 48 49 89 c5 49 8d 46 48 48 8d 51 a0 48 39 c8 75
>>>>>> 0f eb 2e <48> 8b 4a 60 48 8d 51 a0 48 39 c8 74 21 48 39 d5 75 ee
>>>>>> 45 31
>>>>>> f6 4c
>>>>>> [ 2608.649320] RSP: 0018:ffffa838002a7c40 EFLAGS: 00010087
>>>>>> [ 2608.654543] RAX: ffff9a5f4609c048 RBX: ffff9a5f46f48028 RCX:
>>>>>> 0000000000000000
>>>>>> [ 2608.661666] RDX: ffffffffffffffa0 RSI: 0000000000000008 RDI:
>>>>>> ffff9a5f46f48158
>>>>>> [ 2608.668790] RBP: ffff9a5f7bd09b40 R08: 00000000000002d8 R09:
>>>>>> ffff9a5f7dd6a000
>>>>>> [ 2608.675913] R10: ffffa838002a7d90 R11: ffff9a5f46f48300 R12:
>>>>>> ffff9a5f46f48158
>>>>>> [ 2608.683039] R13: 0000000000000046 R14: ffff9a5f4609c000 R15:
>>>>>> ffff9a5f7ad77e00
>>>>>> [ 2608.690165] FS:  0000000000000000(0000) GS:ffff9a5f7e300000(0000)
>>>>>> knlGS:0000000000000000
>>>>>> [ 2608.698244] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>>> [ 2608.703980] CR2: 0000000000000000 CR3: 000000003780a000 CR4:
>>>>>> 00000000001006e0
>>>>>> [ 2608.711102] Call Trace:
>>>>>> [ 2608.713561]  usb_ep_dequeue+0x19/0x80
>>>>>> [ 2608.717234]  u_audio_stop_capture+0x54/0x9a [u_audio]
>>>>>> [ 2608.722289]  afunc_set_alt+0x73/0x80 [usb_f_uac2]
>>>>> I took a look at how the audio function is handling switching
>>>>> alternate
>>>>> setting and dequeuing endpoints, and I think I found the issue.
>>>>>
>>>>> Here's a snippet of the free_ep() code in u_audio.c:
>>>>>
>>>>> static inline void free_ep(struct uac_rtd_params *prm, struct usb_ep
>>>>> *ep)
>>>>> {
>>>>>       .....
>>>>>           for (i = 0; i < params->req_number; i++) {
>>>>>                   if (prm->ureq[i].req) {
>>>>>                           usb_ep_dequeue(ep, prm->ureq[i].req);
>>>>>                           usb_ep_free_request(ep, prm->ureq[i].req);
>>>>>                           prm->ureq[i].req = NULL;
>>>>>                   }
>>>>>           }
>>>>>     ....
>>>>>
>>>>>
>>>>> usb_ep_dequeue() can be asynchronous. The dwc3 still has ownership of
>>>>> the request until it gives back the request. Freeing the request
>>>>> immediately here will cause a problem.
>>>>
>>>> To confirm my suspicion, can you try this and see if you still get
>>>> oops?
>>>>
>>>> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
>>>> index eec8e9a9e3ed..b66eb24ec070 100644
>>>> --- a/drivers/usb/dwc3/gadget.c
>>>> +++ b/drivers/usb/dwc3/gadget.c
>>>> @@ -2031,6 +2031,7 @@ static int dwc3_gadget_ep_dequeue(struct usb_ep
>>>> *ep,
>>>>                           list_for_each_entry_safe(r, t,
>>>> &dep->started_list, list)
>>>>                                  
>>>> dwc3_gadget_move_cancelled_request(r);
>>>>    +
>>>> dwc3_gadget_ep_cleanup_cancelled_requests(dep);
>>>>                           goto out;
>>>>                   }
>>>>           }
>>>>
>>>>
>>>> This will make usb_ep_dequeue() synchronous. (Note that this is not
>>>> tested).
>>>
>>> Unfortunately, it doesn't work. The trace changes to:
>>> root@edison:~# [  104.418264] BUG: kernel NULL pointer dereference,
>>> address: 0000000000000000
>>> [  104.425227] #PF: supervisor instruction fetch in kernel mode
>>> [  104.430877] #PF: error_code(0x0010) - not-present page
>>> [  104.436007] PGD 0 P4D 0
>>> [  104.438547] Oops: 0010 [#1] SMP PTI
>>> [  104.442039] CPU: 1 PID: 605 Comm: irq/15-dwc3 Not tainted
>>> 5.9.0-edison-acpi-standard #1
>>> [  104.450027] Hardware name: Intel Corporation Merrifield/BODEGA BAY,
>>> BIOS 542 2015.01.21:18.19.48
>>> [  104.458802] RIP: 0010:0x0
>>> [  104.461425] Code: Bad RIP value.
>>> [  104.464649] RSP: 0018:ffffae584034fbf8 EFLAGS: 00010046
>>> [  104.469870] RAX: 0000000000000000 RBX: ffff8c198608a028 RCX:
>>> ffff8c19bb87fa00
>>> [  104.476993] RDX: 00000000ffffff94 RSI: ffff8c19bafa54e0 RDI:
>>> ffff8c198609ee00
>>> [  104.484118] RBP: ffff8c19bafa54e0 R08: 0000000000000046 R09:
>>> 0000000000000238
>>> [  104.491241] R10: 000000000000002c R11: ffff8c19bcf62490 R12:
>>> ffff8c198609ee00
>>> [  104.498366] R13: ffff8c198608a028 R14: 0000000000000002 R15:
>>> ffff8c19bb8ff000
>>> [  104.505493] FS:  0000000000000000(0000) GS:ffff8c19be300000(0000)
>>> knlGS:0000000000000000
>>> [  104.513572] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> [  104.519309] CR2: ffffffffffffffd6 CR3: 000000002e80a000 CR4:
>>> 00000000001006e0
>>> [  104.526432] Call Trace:
>>> [  104.528892]  dwc3_gadget_giveback+0xbf/0x120
>>> [  104.533169]  __dwc3_gadget_ep_disable+0xc5/0x250
>>> [  104.537789]  dwc3_gadget_ep_disable+0x3d/0xd0
>>> [  104.542149]  usb_ep_disable+0x1d/0x80
>>> [  104.545823]  u_audio_stop_capture+0x87/0x9a [u_audio]
>>> [  104.550880]  afunc_set_alt+0x73/0x80 [usb_f_uac2]
>>> [  104.555594]  composite_setup+0x20f/0x1b20 [libcomposite]
>>> [  104.560912]  ? configfs_composite_setup+0x6b/0x90 [libcomposite]
>>> [  104.566921]  configfs_composite_setup+0x6b/0x90 [libcomposite]
>>> [  104.572752]  dwc3_ep0_delegate_req+0x24/0x40
>>> [  104.577022]  dwc3_ep0_interrupt+0x40a/0x9d8
>>> [  104.581205]  dwc3_thread_interrupt+0x880/0xf70
>>>
>>
>> Oops, looks like I can't make it synchronous this way. Can you try
>> Jack's change to the u_audio.c instead?
>
> Oops indeed goes away with Jack's change, but usb connection goes
> up/down continuously, meaning: my host sees usb network and audio
> device appearing / disappearing.
>

Ok, thanks for verifying that it went away.

> mass_storage device does not appear all.

There are some fixes to dwc3 in kernel mainline. Is it possible to test
this against linux-next?

Thanks,
Thinh

>
> Host:
> 21-10-2020 22:36    kernel    sd 7:0:0:0: [sdd] tag#0 device offline
> or changed
> 21-10-2020 22:36    kernel    blk_update_request: I/O error, dev sdd,
> sector 0 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
> 21-10-2020 22:36    kernel    Buffer I/O error on dev sdd, logical
> block 0, async page read
>
> And on edison:
> Oct 21 22:37:37 edison kernel: IPv6: ADDRCONF(NETDEV_CHANGE): usb0:
> link becomes ready
> Oct 21 22:37:37 edison kernel[499]: [  436.595952] IPv6:
> ADDRCONF(NETDEV_CHANGE): usb0: link b>
> Oct 21 22:37:37 edison systemd-networkd[521]: usb0: Gained carrier
> Oct 21 22:37:38 edison systemd-networkd[521]: usb0: Gained IPv6LL
> Oct 21 22:38:07 edison systemd-networkd[521]: usb0: Lost carrier
> Oct 21 22:38:07 edison systemd-journald[435]: Forwarding to syslog
> missed 4 messages.
>





[Index of Archives]     [Linux Media]     [Linux Input]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Old Linux USB Devel Archive]

  Powered by Linux