Hi Selvarasu, On Fri, Aug 30, 2024, Selvarasu Ganesan wrote: > > On 8/10/2024 5:12 AM, Thinh Nguyen wrote: > > On Thu, Aug 08, 2024, Selvarasu Ganesan wrote: > >> On 8/8/2024 6:45 AM, Thinh Nguyen wrote: > >>> On Wed, Aug 07, 2024, Selvarasu Ganesan wrote: > >>>> On 8/7/2024 6:08 AM, Thinh Nguyen wrote: > >>>>> On Fri, Jul 19, 2024, Selvarasu Ganesan wrote: > >>>>>> In certain scenarios, there is a chance that the CPU may not be > >>>>>> scheduled the bottom half of dwc3 interrupt. This is because the CPU > >>>>>> may hang up where any work queue lockup has happened for the same CPU > >>>>>> that is trying to schedule the dwc3 thread interrupt. In this scenario, > >>>>>> the USB can enter runtime suspend as the bus may idle for a longer time > >>>>>> , or user can reconnect the USB cable. Then, the dwc3 event interrupt > >>>>>> can be enabled when runtime resume is happening with regardless of the > >>>>>> previous event status. This can lead to a dwc3 IRQ storm due to the > >>>>>> return from the interrupt handler by checking only the evt->flags as > >>>>>> DWC3_EVENT_PENDING, where the same flag was set as DWC3_EVENT_PENDING > >>>>>> in previous work queue lockup. > >>>>>> Let's consider the following sequences in this scenario, > >>>>>> > >>>>>> Call trace of dwc3 IRQ after workqueue lockup scenario > >>>>>> ====================================================== > >>>>>> IRQ #1: > >>>>>> ->dwc3_interrupt() > >>>>>> ->dwc3_check_event_buf() > >>>>>> ->if (evt->flags & DWC3_EVENT_PENDING) > >>>>>> return IRQ_HANDLED; > >>>>>> ->evt->flags |= DWC3_EVENT_PENDING; > >>>>>> ->/* Disable interrupt by setting DWC3_GEVNTSIZ_INTMASK in > >>>>>> DWC3_GEVNTSIZ > >>>>>> ->return IRQ_WAKE_THREAD; // No workqueue scheduled for dwc3 > >>>>>> thread_fu due to workqueue lockup > >>>>>> even after return IRQ_WAKE_THREAD > >>>>>> from top-half. > >>>>>> > >>>>>> Thread #2: > >>>>>> ->dwc3_runtime_resume() > >>>>>> ->dwc3_resume_common() > >>>>>> ->dwc3_gadget_resume() > >>>>>> ->dwc3_gadget_soft_connect() > >>>>>> ->dwc3_event_buffers_setup() > >>>>>> ->/*Enable interrupt by clearing DWC3_GEVNTSIZ_INTMASK in > >>>>>> DWC3_GEVNTSIZ*/ > >>>>>> > >>>>>> Start IRQ Storming after enable dwc3 event in resume path > >>>>>> ========================================================= > >>>>>> CPU0: IRQ > >>>>>> dwc3_interrupt() > >>>>>> dwc3_check_event_buf() > >>>>>> if (evt->flags & DWC3_EVENT_PENDING) > >>>>>> return IRQ_HANDLED; > >>>>>> > >>>>>> CPU0: IRQ > >>>>>> dwc3_interrupt() > >>>>>> dwc3_check_event_buf() > >>>>>> if (evt->flags & DWC3_EVENT_PENDING) > >>>>>> return IRQ_HANDLED; > >>>>>> .. > >>>>>> .. > >>>>>> > >>>>>> To fix this issue by avoiding enabling of the dwc3 event interrupt in > >>>>>> the runtime resume path if dwc3 event processing is in progress. > >>>>>> > >>>>>> Signed-off-by: Selvarasu Ganesan <selvarasu.g@xxxxxxxxxxx> > >>>>>> --- > >>>>>> drivers/usb/dwc3/core.c | 8 ++++++-- > >>>>>> 1 file changed, 6 insertions(+), 2 deletions(-) > >>>>>> > >>>>>> diff --git a/drivers/usb/dwc3/core.c b/drivers/usb/dwc3/core.c > >>>>>> index cb82557678dd..610792a70805 100644 > >>>>>> --- a/drivers/usb/dwc3/core.c > >>>>>> +++ b/drivers/usb/dwc3/core.c > >>>>>> @@ -549,8 +549,12 @@ int dwc3_event_buffers_setup(struct dwc3 *dwc) > >>>>>> lower_32_bits(evt->dma)); > >>>>>> dwc3_writel(dwc->regs, DWC3_GEVNTADRHI(0), > >>>>>> upper_32_bits(evt->dma)); > >>>>>> - dwc3_writel(dwc->regs, DWC3_GEVNTSIZ(0), > >>>>>> - DWC3_GEVNTSIZ_SIZE(evt->length)); > >>>>>> + > >>>>>> + /* Skip enable dwc3 event interrupt if event is processing in middle */ > >>>>>> + if (!(evt->flags & DWC3_EVENT_PENDING)) > >>>>>> + dwc3_writel(dwc->regs, DWC3_GEVNTSIZ(0), > >>>>>> + DWC3_GEVNTSIZ_SIZE(evt->length)); > >>>>>> + > >>>>>> dwc3_writel(dwc->regs, DWC3_GEVNTCOUNT(0), 0); > >>>>>> > >>>>>> return 0; > >>>>>> -- > >>>>>> 2.17.1 > >>>>>> > >>>>> We're not waking up from a hibernation. So after a soft-reset and > >>>>> resume, the events that weren't processed are stale. They should be > >>>>> processed prior to entering suspend or be discarded before resume. > >>>>> > >>>>> The synchronize_irq() during suspend() was not sufficient to prevent > >>>>> this? What are we missing here. > >>>>> > >>>>> Thanks, > >>>>> Thinh > >>>> I don’t think the triggering of interrupt would not be stopped even if > >>>> do soft reset. It's because of event count is may be valid . > >>> Ok. I think I see what you're referring to when you say "event is > >>> processing in the middle" now. > >>> > >>> What you want to check is probably this in dwc3_event_buffers_setup(). > >>> Please confirm: > >>> > >>> if (dwc->pending_events) > >>> dwc3_writel(dwc->regs, DWC3_GEVNTSIZ(0), > >>> DWC3_GEVNTSIZ_INTMASK | DWC3_GEVNTSIZ_SIZE(evt->length)); > >>> else > >>> dwc3_writel(dwc->regs, DWC3_GEVNTSIZ(0), DWC3_GEVNTSIZ_SIZE(evt->length)); > >> Yes, we are expecting the same. But, we must verify the status of > >> evt->flags, which will indicate whether the event is currently > >> processing in middle or not. The below code is for the reference. > >> > >> if (!(evt->flags & DWC3_EVENT_PENDING)) > >> dwc3_writel(dwc->regs, DWC3_GEVNTSIZ(0), > >> DWC3_GEVNTSIZ_SIZE(evt->length)); > >> else > >> dwc3_writel(dwc->regs, DWC3_GEVNTSIZ(0), > >> DWC3_GEVNTSIZ_INTMASK | DWC3_GEVNTSIZ_SIZE(evt->length)); > > So, this happens while pending_events is set right? I need to review > > this runtime suspend flow next week. Something doesn't look right. When > > there's a suspend/resume runtime or not, there's a soft disconnect. We > > shouldn't be processing any event prior to going into suspend. Also, we > > shouldn't be doing soft-disconnect while connected and in operation > > unless we specifically tell it to. > HI Thinh, > > Would you be able to review this runtime suspend flow? > > From our end, after conducting multiple regression tests, we have > determined that the resetting of "evt->flags" are sufficient when > attempting to enable event IRQ masks instead of enable event IRQ mask > based on pending event flags. There is a possibility that reconnecting > USB with the host PC may cause event interrupts to be missed by the CPU > if disable event IRQ mask. So, The fix should be as follow. Could you > please review this once from your end? > > diff --git a/drivers/usb/dwc3/core.c b/drivers/usb/dwc3/core.c > index ccc3895dbd7f..3b2441608e9e 100644 > --- a/drivers/usb/dwc3/core.c > +++ b/drivers/usb/dwc3/core.c > @@ -554,6 +554,15 @@ int dwc3_event_buffers_setup(struct dwc3 *dwc) > lower_32_bits(evt->dma)); > dwc3_writel(dwc->regs, DWC3_GEVNTADRHI(0), > upper_32_bits(evt->dma)); > + > + /* > + * The DWC3_EVENT_PENDING flag is cleared if it has > + * already been set when enabling the event IRQ mask > + * to prevent possibly of an IRQ storm. > + */ > + if (evt->flags & DWC3_EVENT_PENDING) > + evt->flags &= ~DWC3_EVENT_PENDING; > + > dwc3_writel(dwc->regs, DWC3_GEVNTSIZ(0), > DWC3_GEVNTSIZ_SIZE(evt->length)); > dwc3_writel(dwc->regs, DWC3_GEVNTCOUNT(0), 0); > Sorry for the delay response. In addition to that, please rework and rename dwc3_gadget_process_pending_event(). We're not supposed to handle any left-over event. So remove the manual calls to the interrupt handlers there. On runtime suspend, the device is soft disconnected. So any interrupt assertion to notify a new connection must be a custom configuration of your platform. No event should be generated while the run_stop bit is cleared. On runtime resume, we will initiate soft-reset and soft-connect to restore the run_stop bit. A new connection event will be generated then. Thanks, Thinh