Re: usb: gadget: ffs: Fix BUG when userland exits with submitted AIO transfers

Lars-Peter Clausen <lars@xxxxxxxxxx> · Tue, 23 Oct 2018 14:20:45 +0200

On 09/25/2018 02:46 PM, Vincent Pelletier wrote:
> Hello,
> 
> On Thu, 2 Aug 2018 14:23:51 +0000, Vincent Pelletier
> <plr.vincent@xxxxxxxxx> wrote:
>> On Thu, 2 Aug 2018 00:45:14 +0000, "He, Bo" <bo.he@xxxxxxxxx> wrote:
>>> Your patch fix the issue BUG: scheduling while atomic:  
>>
>> Yes, although from my understanding of Felipe's answer, the actual bug
>> is the "scheduling" part (sleeping in dwc3 UDC) rather than the
>> "atomic" part.
>>
>> So my patch addresses, still if my understanding is correct, the wrong
>> half of the problem, and even introduced the regression you identified.
>> Hence my uncertainty...
> 
> I notice that neither He's patch, nor a dwc3 change to prevent it from
> scheduling inside usb_ep_dequeue are in Linus' master. Please correct
> me if I missed something.
> 
> Just in case my previous emails were not clear:
> - I have no objection to He's patch on its own (and I do not know the
>   code nearly enough to provide a meaningful reviewed-by).
> - I do not intend to work on making changes to dwc3 gadget to stop it
>   from scheduling in usb_ep_dequeue in the foreseeable future.
> 
> So if there is no ongoing work on dwc3 cancel behaviour (Felipe ?),
> please do resume/carry on with reviewing and integrating He's patch.
> 
> It is only *if* dwc3 cancel stops scheduling that I believe my patch
> should be reverted (here is the hash as of Linus' master):
>   commit d52e4d0c0c428bf2ba35074a7495cdb28e2efbae
>   Author: Vincent Pelletier <plr.vincent@xxxxxxxxx>
>   Date:   Wed Jun 13 11:05:06 2018 +0000
> 
>       usb: gadget: ffs: Fix BUG when userland exits with submitted AIO transfers
> 
>       This bug happens only when the UDC needs to sleep during usb_ep_dequeue,
>       as is the case for (at least) dwc3.
> and, in my understanding, a consequence is that He's fix would not
> be needed anymore - the bug my patch introduced disappearing in the
> revert.

Hi,

I just started testing with your patch applied and the patch seems to
contain a race condition on its own.

When a aio_cancel() is called ffs_aio_cancel_worker() is queued. Now what
might happen is that the URB completes on its own before the work item has a
chance to run. And it seems that ffs_copy_worker() can even overtake the
cancel work. In that case the data structure associated with the URB is
deleted while the cancel_worker is still queued.

I get the following output on the console

[   91.974151] ODEBUG: free active (active state 0) object type: work_struct
hint: ffs_aio_cancel_worker+0x0/0x20
[   92.211124] [<ffffff8008466ca4>] debug_print_object+0xac/0xc0
[   92.216854] [<ffffff8008467e48>] __debug_check_no_obj_freed+0x1b0/0x208
[   92.223451] [<ffffff8008468104>] debug_check_no_obj_freed+0x1c/0x28
[   92.229701] [<ffffff80081b1a40>] kfree+0x78/0xc8
[   92.234302] [<ffffff80086e7060>] ffs_user_copy_worker+0xd0/0x128
[   92.240293] [<ffffff80080b8620>] process_one_work+0x1e0/0x3c0
[   92.246022] [<ffffff80080b8854>] worker_thread+0x54/0x410
[   92.251403] [<ffffff80080bf8fc>] kthread+0x104/0x130
[   92.256351] [<ffffff8008084db0>] ret_from_fork+0x10/0x18

- Lars