On 8/10/2024 5:15 AM, Thinh Nguyen wrote: > On Fri, Aug 09, 2024, Thinh Nguyen wrote: >> On Thu, Aug 08, 2024, Selvarasu Ganesan wrote: >>> On 8/8/2024 6:45 AM, Thinh Nguyen wrote: >>>> On Wed, Aug 07, 2024, Selvarasu Ganesan wrote: >>>>> On 8/7/2024 6:08 AM, Thinh Nguyen wrote: >>>>>> On Fri, Jul 19, 2024, Selvarasu Ganesan wrote: >>>>>>> In certain scenarios, there is a chance that the CPU may not be >>>>>>> scheduled the bottom half of dwc3 interrupt. This is because the CPU >>>>>>> may hang up where any work queue lockup has happened for the same CPU >>>>>>> that is trying to schedule the dwc3 thread interrupt. In this scenario, >>>>>>> the USB can enter runtime suspend as the bus may idle for a longer time >>>>>>> , or user can reconnect the USB cable. Then, the dwc3 event interrupt >>>>>>> can be enabled when runtime resume is happening with regardless of the >>>>>>> previous event status. This can lead to a dwc3 IRQ storm due to the >>>>>>> return from the interrupt handler by checking only the evt->flags as >>>>>>> DWC3_EVENT_PENDING, where the same flag was set as DWC3_EVENT_PENDING >>>>>>> in previous work queue lockup. >>>>>>> Let's consider the following sequences in this scenario, >>>>>>> >>>>>>> Call trace of dwc3 IRQ after workqueue lockup scenario >>>>>>> ====================================================== >>>>>>> IRQ #1: >>>>>>> ->dwc3_interrupt() >>>>>>> ->dwc3_check_event_buf() >>>>>>> ->if (evt->flags & DWC3_EVENT_PENDING) >>>>>>> return IRQ_HANDLED; >>>>>>> ->evt->flags |= DWC3_EVENT_PENDING; >>>>>>> ->/* Disable interrupt by setting DWC3_GEVNTSIZ_INTMASK in >>>>>>> DWC3_GEVNTSIZ >>>>>>> ->return IRQ_WAKE_THREAD; // No workqueue scheduled for dwc3 >>>>>>> thread_fu due to workqueue lockup >>>>>>> even after return IRQ_WAKE_THREAD >>>>>>> from top-half. >>>>>>> >>>>>>> Thread #2: >>>>>>> ->dwc3_runtime_resume() >>>>>>> ->dwc3_resume_common() >>>>>>> ->dwc3_gadget_resume() >>>>>>> ->dwc3_gadget_soft_connect() >>>>>>> ->dwc3_event_buffers_setup() >>>>>>> ->/*Enable interrupt by clearing DWC3_GEVNTSIZ_INTMASK in >>>>>>> DWC3_GEVNTSIZ*/ >>>>>>> >>>>>>> Start IRQ Storming after enable dwc3 event in resume path >>>>>>> ========================================================= >>>>>>> CPU0: IRQ >>>>>>> dwc3_interrupt() >>>>>>> dwc3_check_event_buf() >>>>>>> if (evt->flags & DWC3_EVENT_PENDING) >>>>>>> return IRQ_HANDLED; >>>>>>> >>>>>>> CPU0: IRQ >>>>>>> dwc3_interrupt() >>>>>>> dwc3_check_event_buf() >>>>>>> if (evt->flags & DWC3_EVENT_PENDING) >>>>>>> return IRQ_HANDLED; >>>>>>> .. >>>>>>> .. >>>>>>> >>>>>>> To fix this issue by avoiding enabling of the dwc3 event interrupt in >>>>>>> the runtime resume path if dwc3 event processing is in progress. >>>>>>> >>>>>>> Signed-off-by: Selvarasu Ganesan <selvarasu.g@xxxxxxxxxxx> >>>>>>> --- >>>>>>> drivers/usb/dwc3/core.c | 8 ++++++-- >>>>>>> 1 file changed, 6 insertions(+), 2 deletions(-) >>>>>>> >>>>>>> diff --git a/drivers/usb/dwc3/core.c b/drivers/usb/dwc3/core.c >>>>>>> index cb82557678dd..610792a70805 100644 >>>>>>> --- a/drivers/usb/dwc3/core.c >>>>>>> +++ b/drivers/usb/dwc3/core.c >>>>>>> @@ -549,8 +549,12 @@ int dwc3_event_buffers_setup(struct dwc3 *dwc) >>>>>>> lower_32_bits(evt->dma)); >>>>>>> dwc3_writel(dwc->regs, DWC3_GEVNTADRHI(0), >>>>>>> upper_32_bits(evt->dma)); >>>>>>> - dwc3_writel(dwc->regs, DWC3_GEVNTSIZ(0), >>>>>>> - DWC3_GEVNTSIZ_SIZE(evt->length)); >>>>>>> + >>>>>>> + /* Skip enable dwc3 event interrupt if event is processing in middle */ >>>>>>> + if (!(evt->flags & DWC3_EVENT_PENDING)) >>>>>>> + dwc3_writel(dwc->regs, DWC3_GEVNTSIZ(0), >>>>>>> + DWC3_GEVNTSIZ_SIZE(evt->length)); >>>>>>> + >>>>>>> dwc3_writel(dwc->regs, DWC3_GEVNTCOUNT(0), 0); >>>>>>> >>>>>>> return 0; >>>>>>> -- >>>>>>> 2.17.1 >>>>>>> >>>>>> We're not waking up from a hibernation. So after a soft-reset and >>>>>> resume, the events that weren't processed are stale. They should be >>>>>> processed prior to entering suspend or be discarded before resume. >>>>>> >>>>>> The synchronize_irq() during suspend() was not sufficient to prevent >>>>>> this? What are we missing here. >>>>>> >>>>>> Thanks, >>>>>> Thinh >>>>> I don’t think the triggering of interrupt would not be stopped even if >>>>> do soft reset. It's because of event count is may be valid . >>>> Ok. I think I see what you're referring to when you say "event is >>>> processing in the middle" now. >>>> >>>> What you want to check is probably this in dwc3_event_buffers_setup(). >>>> Please confirm: >>>> >>>> if (dwc->pending_events) >>>> dwc3_writel(dwc->regs, DWC3_GEVNTSIZ(0), >>>> DWC3_GEVNTSIZ_INTMASK | DWC3_GEVNTSIZ_SIZE(evt->length)); >>>> else >>>> dwc3_writel(dwc->regs, DWC3_GEVNTSIZ(0), DWC3_GEVNTSIZ_SIZE(evt->length)); >>> Yes, we are expecting the same. But, we must verify the status of >>> evt->flags, which will indicate whether the event is currently >>> processing in middle or not. The below code is for the reference. >>> >>> if (!(evt->flags & DWC3_EVENT_PENDING)) >>> dwc3_writel(dwc->regs, DWC3_GEVNTSIZ(0), >>> DWC3_GEVNTSIZ_SIZE(evt->length)); >>> else >>> dwc3_writel(dwc->regs, DWC3_GEVNTSIZ(0), >>> DWC3_GEVNTSIZ_INTMASK | DWC3_GEVNTSIZ_SIZE(evt->length)); >> So, this happens while pending_events is set right? I need to review >> this runtime suspend flow next week. Something doesn't look right. When yes. You are correct. Its happening while pending_events is set. >> there's a suspend/resume runtime or not, there's a soft disconnect. We >> shouldn't be processing any event prior to going into suspend. Also, we > Clarification: I mean we shouldn't process any event that happened prior > to suspend on resume because there was a disconnect. Agree. > >> shouldn't be doing soft-disconnect while connected and in operation >> unless we specifically tell it to. > Thinh