Hi, Vincent Pelletier <plr.vincent@xxxxxxxxx> writes: > On Sat, 25 Nov 2017 16:39:52 +0000, Vincent Pelletier > <plr.vincent@xxxxxxxxx> wrote: >> To my surprise, the error symptom do not seem to change: > > Having read some more on kernel debugging and especially critical > sections, I realise that while the general issue is still there, the > symptom did change consistently with modified code: what was > >> [ 382.207124] 4 locks held by screen/1808: >> [ 382.211266] #0: (rcu_callback){....}, at: [<c10b4ff0>] rcu_process_callbacks+0x260/0x440 >> [ 382.219949] #1: (rcu_read_lock_sched){....}, at: [<c1358ba0>] percpu_ref_switch_to_atomic_rcu+0xb0/0x130 >> [ 382.230034] #2: (&(&ctx->ctx_lock)->rlock){....}, at: [<c11f0c73>] free_ioctx_users+0x23/0xd0 >> [ 382.230096] #3: (&(&ffs->eps_lock)->rlock){....}, at: [<f81e7710>] ffs_aio_cancel+0x20/0x60 [usb_f_fs] > > became > >> [ 382.511767] 3 locks held by swapper/1/0: >> [ 382.515903] #0: (rcu_callback){....}, at: [<c10b4ff0>] rcu_process_callbacks+0x260/0x440 >> [ 382.524572] #1: (rcu_read_lock_sched){....}, at: [<c1358ba0>] percpu_ref_switch_to_atomic_rcu+0xb0/0x130 >> [ 382.534650] #2: (&(&ctx->ctx_lock)->rlock){....}, at: [<c11f0c73>] free_ioctx_users+0x23/0xd0 > > Then, I looked a bit at these. free_ioctx_users is called via > percpu_ref_init, which specifies that: > /** > * percpu_ref_init - initialize a percpu refcount > [...] > * Note that @release must not sleep - it may potentially be called from RCU > * callback context by percpu_ref_kill(). > */ > > On the other end, if I understand dwc3_gadget_ep_dequeue correctly it > has to wait for hardware to confirm it will not touch the transfer, so > some sleeping seems required. > > So far I lack proper knowledge to tell how to get both sides to agree. > > Taking a peed at dwc2, I see it does not call wait_event_lock_irq but > instead does a busy loop checking chip registers and waiting 1µs > between loop (I guess this does not count as "sleeping", as I think no > context switch can happen). > > In dwc3, DWC3_EP_END_TRANSFER_PENDING flag gets cleared during > interrupt handling (bottom-half handler) and not by polling a > register, so it does not seem possible (...or at least trivial) to > transpose the dwc2 way, so I'm not sure where to go from here. as I said, the *only* thing that schedules from inside dwc3_gadget_ep_dequeue() is wait_event_lock_irq(). Which works fine unless usb_ep_dequeue() is called with locks held and IRQs disabled. If my original suggestion didn't help, then there may be other bugs in f_fs.c. -- balbi
Attachment:
signature.asc
Description: PGP signature