On 2024-09-27 15:33:35 [+0200], Hubert Wiśniewski wrote: > On Thu, 2024-09-26 at 21:39 +0200, Hubert Wiśniewski wrote: > > I'm a bit at loss here. The deadlock seems to be unrelated to netif_rx() > > (which is not being called in the interrupt context after all), yet > > replacing it with __netif_rx() fixes the lockup (though a warning is still > > generated, which suggests that the patch does not completely fix the > > issue). > > Well, never mind. After some investigation, I think the problem is as > follows: > > 1. musb_g_giveback() releases the musb lock using spin_unlock(). The lock > is now released, but hardirqs are still disabled. > > 2. Then, usb_gadget_giveback_request() is called, which in turn calls > rx_complete(). This does not happen in the interrupt context, so netif_rx() > disables bottom havles, then enables them using local_bh_enable(). > > 3. This leads to calling __local_bh_enable_ip(), which gives off a warning > (the first backtrace) that hardirqs are disabled. Then, hardirqs are > disabled (again?), and then enabled (as they should have been in the first > place). > > 4. After usb_gadget_giveback_request() returns, musb_g_giveback() acquires > the musb lock using spin_lock(). This does not disable hardirqs, so they > are still enabled. > > 5. While the musb lock is acquired, an interrupt occurs. It is handled by > dsps_interrupt(), which acquires the musb lock. A deadlock occurs. This all makes sense so far. > Replacing netif_rx() with __netif_rx() apparently fixes this part, as it > does not lead to any change of hardirq state. There is still one problem > though: rx_complete() is usually called from the interrupt context, except > when the network interface is brought up. __netif_rx() has an assert which should complain if you use __netif_rx(). Further in this case you pass the skb to backlog but never kick it for processing. Which means it is delayed until a random interrupt notices and processes it. > I think one solution would be to make musb_g_giveback() use > spin_unlock_irqrestore() and spin_lock_irqsave(), but I would need to pass > the flags to it somehow. Also, I am not sure how that would influence other > drivers using musb. I would also suggest to do this since the other solution is not safe/ correct. There is the ->busy assignment which should cover for the most cases. If you drop the lock without enabling interrupts then the interrupt can't do anything to the EP and other enqueue/ dequeue invocation is not possible if run on UP. On the other hand am335x was used on PREEMPT_RT and it runs a UP machine into SMP so that should be covered :) While looking at it, dequeue/ enqueue during complete callback looks safe due to the busy flag. Sebastian