On Sat, Oct 05, 2024 at 03:19:10PM +0200, Hubert Wiśniewski wrote: > There is a possibility that a request's callback could be invoked from > usb_ep_queue() (call trace below, supplemented with missing calls): > > req->complete from usb_gadget_giveback_request > (drivers/usb/gadget/udc/core.c:999) > usb_gadget_giveback_request from musb_g_giveback > (drivers/usb/musb/musb_gadget.c:147) > musb_g_giveback from rxstate > (drivers/usb/musb/musb_gadget.c:784) > rxstate from musb_ep_restart > (drivers/usb/musb/musb_gadget.c:1169) > musb_ep_restart from musb_ep_restart_resume_work > (drivers/usb/musb/musb_gadget.c:1176) > musb_ep_restart_resume_work from musb_queue_resume_work > (drivers/usb/musb/musb_core.c:2279) > musb_queue_resume_work from musb_gadget_queue > (drivers/usb/musb/musb_gadget.c:1241) > musb_gadget_queue from usb_ep_queue > (drivers/usb/gadget/udc/core.c:300) > > According to the docstring of usb_ep_queue(), this should not happen: > > "Note that @req's ->complete() callback must never be called from within > usb_ep_queue() as that can create deadlock situations." > > In fact, a hardware lockup might occur in the following sequence: > > 1. The gadget is initialized using musb_gadget_enable(). > 2. Meanwhile, a packet arrives, and the RXPKTRDY flag is set, raising an > interrupt. > 3. If IRQs are enabled, the interrupt is handled, but musb_g_rx() finds an > empty queue (next_request() returns NULL). The interrupt flag has > already been cleared by the glue layer handler, but the RXPKTRDY flag > remains set. > 4. The first request is enqueued using usb_ep_queue(), leading to the call > of req->complete(), as shown in the call trace above. > 5. If the callback enables IRQs and another packet is waiting, step (3) > repeats. The request queue is empty because usb_g_giveback() removes the > request before invoking the callback. > 6. The endpoint remains locked up, as the interrupt triggered by hardware > setting the RXPKTRDY flag has been handled, but the flag itself remains > set. > > For this scenario to occur, it is only necessary for IRQs to be enabled at > some point during the complete callback. This happens with the USB Ethernet > gadget, whose rx_complete() callback calls netif_rx(). If called in the > task context, netif_rx() disables the bottom halves (BHs). When the BHs are > re-enabled, IRQs are also enabled to allow soft IRQs to be processed. The > gadget itself is initialized at module load (or at boot if built-in), but > the first request is enqueued when the network interface is brought up, > triggering rx_complete() in the task context via ioctl(). If a packet > arrives while the interface is down, it can prevent the interface from > receiving any further packets from the USB host. > > The situation is quite complicated with many parties involved. This > particular issue can be resolved in several possible ways: > > 1. Ensure that callbacks never enable IRQs. This would be difficult to > enforce, as discovering how netif_rx() interacts with interrupts was > already quite challenging and u_ether is not the only function driver. > Similar "bugs" could be hidden in other drivers as well. > 2. Disable MUSB interrupts in musb_g_giveback() before calling the callback > and re-enable them afterwars (by calling musb_{dis,en}able_interrupts(), > for example). This would ensure that MUSB interrupts are not handled > during the callback, even if IRQs are enabled. In fact, it would allow > IRQs to be enabled when releasing the lock. However, this feels like an > inelegant hack. > 3. Modify the interrupt handler to clear the RXPKTRDY flag if the request > queue is empty. While this approach also feels like a hack, it wastes > CPU time by attempting to handle incoming packets when the software is > not ready to process them. > 4. Flush the Rx FIFO instead of calling rxstate() in musb_ep_restart(). > This ensures that the hardware can receive packets when there is at > least one request in the queue. Once IRQs are enabled, the interrupt > handler will be able to correctly process the next incoming packet > (eventually calling rxstate()). This approach may cause one or two > packets to be dropped (two if double buffering is enabled), but this > seems to be a minor issue, as packet loss can occur when the software is > not yet ready to process them. Additionally, this solution makes the > gadget driver compliant with the rule mentioned in the docstring of > usb_ep_queue(). > > There may be additional solutions, but from these four, the last one has > been chosen as it seems to be the most appropriate, as it addresses the > "bad" behavior of the driver. > > Signed-off-by: Hubert Wiśniewski <hubert.wisniewski.25632@xxxxxxxxx> > --- > drivers/usb/musb/musb_gadget.c | 13 ++++++++++--- > 1 file changed, 10 insertions(+), 3 deletions(-) > > diff --git a/drivers/usb/musb/musb_gadget.c b/drivers/usb/musb/musb_gadget.c > index bdf13911a1e5..c6076df0d50c 100644 > --- a/drivers/usb/musb/musb_gadget.c > +++ b/drivers/usb/musb/musb_gadget.c > @@ -1161,12 +1161,19 @@ void musb_free_request(struct usb_ep *ep, struct usb_request *req) > */ > void musb_ep_restart(struct musb *musb, struct musb_request *req) > { > + u16 csr; > + void __iomem *epio = req->ep->hw_ep->regs; > + > trace_musb_req_start(req); > musb_ep_select(musb->mregs, req->epnum); > - if (req->tx) > + if (req->tx) { > txstate(musb, req); > - else > - rxstate(musb, req); > + } else { > + csr = musb_readw(epio, MUSB_RXCSR); > + csr |= MUSB_RXCSR_FLUSHFIFO | MUSB_RXCSR_P_WZC_BITS; > + musb_writew(epio, MUSB_RXCSR, csr); > + musb_writew(epio, MUSB_RXCSR, csr); > + } > } > > static int musb_ep_restart_resume_work(struct musb *musb, void *data) > > base-commit: 9852d85ec9d492ebef56dc5f229416c925758edc > -- > 2.30.2 > > What commit id does this fix? thanks, greg k-h