On 09/25/2018 02:46 PM, Vincent Pelletier wrote: > Hello, > > On Thu, 2 Aug 2018 14:23:51 +0000, Vincent Pelletier > <plr.vincent@xxxxxxxxx> wrote: >> On Thu, 2 Aug 2018 00:45:14 +0000, "He, Bo" <bo.he@xxxxxxxxx> wrote: >>> Your patch fix the issue BUG: scheduling while atomic: >> >> Yes, although from my understanding of Felipe's answer, the actual bug >> is the "scheduling" part (sleeping in dwc3 UDC) rather than the >> "atomic" part. >> >> So my patch addresses, still if my understanding is correct, the wrong >> half of the problem, and even introduced the regression you identified. >> Hence my uncertainty... > > I notice that neither He's patch, nor a dwc3 change to prevent it from > scheduling inside usb_ep_dequeue are in Linus' master. Please correct > me if I missed something. > > Just in case my previous emails were not clear: > - I have no objection to He's patch on its own (and I do not know the > code nearly enough to provide a meaningful reviewed-by). > - I do not intend to work on making changes to dwc3 gadget to stop it > from scheduling in usb_ep_dequeue in the foreseeable future. > > So if there is no ongoing work on dwc3 cancel behaviour (Felipe ?), > please do resume/carry on with reviewing and integrating He's patch. > > It is only *if* dwc3 cancel stops scheduling that I believe my patch > should be reverted (here is the hash as of Linus' master): > commit d52e4d0c0c428bf2ba35074a7495cdb28e2efbae > Author: Vincent Pelletier <plr.vincent@xxxxxxxxx> > Date: Wed Jun 13 11:05:06 2018 +0000 > > usb: gadget: ffs: Fix BUG when userland exits with submitted AIO transfers > > This bug happens only when the UDC needs to sleep during usb_ep_dequeue, > as is the case for (at least) dwc3. > and, in my understanding, a consequence is that He's fix would not > be needed anymore - the bug my patch introduced disappearing in the > revert. Hi, I just started testing with your patch applied and the patch seems to contain a race condition on its own. When a aio_cancel() is called ffs_aio_cancel_worker() is queued. Now what might happen is that the URB completes on its own before the work item has a chance to run. And it seems that ffs_copy_worker() can even overtake the cancel work. In that case the data structure associated with the URB is deleted while the cancel_worker is still queued. I get the following output on the console [ 91.974151] ODEBUG: free active (active state 0) object type: work_struct hint: ffs_aio_cancel_worker+0x0/0x20 [ 92.211124] [<ffffff8008466ca4>] debug_print_object+0xac/0xc0 [ 92.216854] [<ffffff8008467e48>] __debug_check_no_obj_freed+0x1b0/0x208 [ 92.223451] [<ffffff8008468104>] debug_check_no_obj_freed+0x1c/0x28 [ 92.229701] [<ffffff80081b1a40>] kfree+0x78/0xc8 [ 92.234302] [<ffffff80086e7060>] ffs_user_copy_worker+0xd0/0x128 [ 92.240293] [<ffffff80080b8620>] process_one_work+0x1e0/0x3c0 [ 92.246022] [<ffffff80080b8854>] worker_thread+0x54/0x410 [ 92.251403] [<ffffff80080bf8fc>] kthread+0x104/0x130 [ 92.256351] [<ffffff8008084db0>] ret_from_fork+0x10/0x18 - Lars