On 8/7/2024 6:08 AM, Thinh Nguyen wrote: > On Fri, Jul 19, 2024, Selvarasu Ganesan wrote: >> In certain scenarios, there is a chance that the CPU may not be >> scheduled the bottom half of dwc3 interrupt. This is because the CPU >> may hang up where any work queue lockup has happened for the same CPU >> that is trying to schedule the dwc3 thread interrupt. In this scenario, >> the USB can enter runtime suspend as the bus may idle for a longer time >> , or user can reconnect the USB cable. Then, the dwc3 event interrupt >> can be enabled when runtime resume is happening with regardless of the >> previous event status. This can lead to a dwc3 IRQ storm due to the >> return from the interrupt handler by checking only the evt->flags as >> DWC3_EVENT_PENDING, where the same flag was set as DWC3_EVENT_PENDING >> in previous work queue lockup. >> Let's consider the following sequences in this scenario, >> >> Call trace of dwc3 IRQ after workqueue lockup scenario >> ====================================================== >> IRQ #1: >> ->dwc3_interrupt() >> ->dwc3_check_event_buf() >> ->if (evt->flags & DWC3_EVENT_PENDING) >> return IRQ_HANDLED; >> ->evt->flags |= DWC3_EVENT_PENDING; >> ->/* Disable interrupt by setting DWC3_GEVNTSIZ_INTMASK in >> DWC3_GEVNTSIZ >> ->return IRQ_WAKE_THREAD; // No workqueue scheduled for dwc3 >> thread_fu due to workqueue lockup >> even after return IRQ_WAKE_THREAD >> from top-half. >> >> Thread #2: >> ->dwc3_runtime_resume() >> ->dwc3_resume_common() >> ->dwc3_gadget_resume() >> ->dwc3_gadget_soft_connect() >> ->dwc3_event_buffers_setup() >> ->/*Enable interrupt by clearing DWC3_GEVNTSIZ_INTMASK in >> DWC3_GEVNTSIZ*/ >> >> Start IRQ Storming after enable dwc3 event in resume path >> ========================================================= >> CPU0: IRQ >> dwc3_interrupt() >> dwc3_check_event_buf() >> if (evt->flags & DWC3_EVENT_PENDING) >> return IRQ_HANDLED; >> >> CPU0: IRQ >> dwc3_interrupt() >> dwc3_check_event_buf() >> if (evt->flags & DWC3_EVENT_PENDING) >> return IRQ_HANDLED; >> .. >> .. >> >> To fix this issue by avoiding enabling of the dwc3 event interrupt in >> the runtime resume path if dwc3 event processing is in progress. >> >> Signed-off-by: Selvarasu Ganesan <selvarasu.g@xxxxxxxxxxx> >> --- >> drivers/usb/dwc3/core.c | 8 ++++++-- >> 1 file changed, 6 insertions(+), 2 deletions(-) >> >> diff --git a/drivers/usb/dwc3/core.c b/drivers/usb/dwc3/core.c >> index cb82557678dd..610792a70805 100644 >> --- a/drivers/usb/dwc3/core.c >> +++ b/drivers/usb/dwc3/core.c >> @@ -549,8 +549,12 @@ int dwc3_event_buffers_setup(struct dwc3 *dwc) >> lower_32_bits(evt->dma)); >> dwc3_writel(dwc->regs, DWC3_GEVNTADRHI(0), >> upper_32_bits(evt->dma)); >> - dwc3_writel(dwc->regs, DWC3_GEVNTSIZ(0), >> - DWC3_GEVNTSIZ_SIZE(evt->length)); >> + >> + /* Skip enable dwc3 event interrupt if event is processing in middle */ >> + if (!(evt->flags & DWC3_EVENT_PENDING)) >> + dwc3_writel(dwc->regs, DWC3_GEVNTSIZ(0), >> + DWC3_GEVNTSIZ_SIZE(evt->length)); >> + >> dwc3_writel(dwc->regs, DWC3_GEVNTCOUNT(0), 0); >> >> return 0; >> -- >> 2.17.1 >> > We're not waking up from a hibernation. So after a soft-reset and > resume, the events that weren't processed are stale. They should be > processed prior to entering suspend or be discarded before resume. > > The synchronize_irq() during suspend() was not sufficient to prevent > this? What are we missing here. > > Thanks, > Thinh I don’t think the triggering of interrupt would not be stopped even if do soft reset. It's because of event count is may be valid . Let consider the scenarios where SW is not acknowledge the event count after getting a interrupt with disable GEVNTSIZ_INTMASK =0. It will triggering the spurious interrupts until enable GEVNTSIZ_INTMASK=1 or acknowledge the event count by SW. This is happening here because of We just return from interrupt handler by checking if evt->flags as DWC3_EVENT_PENDING. Clearing of DWC3_EVENT_PENDING flag is done in dwc3_thread_interrupt. In some scenario it's not happening due to cpu hangup or work queue lockup. The same issue was reported earlier and not derived actual root cause from USB dwc3 driver point of view, and somehow we managing fix with our vendor kernel by using kretprobe. To fix this issue, We have to reset the evt->flags or stop disable GEVNTSIZ_INTMASK=0 https://lore.kernel.org/linux-usb/20230102050831.105499-1-jh0801.jung@xxxxxxxxxxx/ https://lore.kernel.org/linux-usb/1646630679-121585-1-git-send-email-jh0801.jung@xxxxxxxxxxx/ Thanks Selva