On Fri, Sep 06, 2024, Selvarasu Ganesan wrote: > > On 9/6/2024 4:48 AM, Thinh Nguyen wrote: > > On Fri, Sep 06, 2024, Selvarasu Ganesan wrote: > >> On 9/6/2024 2:43 AM, Thinh Nguyen wrote: > >>> On Thu, Sep 05, 2024, Selvarasu Ganesan wrote: > >>>> On 9/5/2024 5:56 AM, Thinh Nguyen wrote: > >>>>> On Wed, Sep 04, 2024, Selvarasu Ganesan wrote: > >>>>>> On 9/4/2024 6:33 AM, Thinh Nguyen wrote: > >>>>>>> On Mon, Sep 02, 2024, Selvarasu Ganesan wrote: > >>>>>>>> I would like to reconfirm from our end that in our failure scenario, we > >>>>>>>> observe that DWC3_EVENT_PENDING is set in evt->flags when the dwc3 > >>>>>>>> resume sequence is executed, and the dwc->pending_events flag is not > >>>>>>>> being set. > >>>>>>>> > >>>>>>> If the controller is stopped, no event is generated until it's restarted > >>>>>>> again. (ie, you should not see GEVNTCOUNT updated after clearing > >>>>>>> DCTL.run_stop). If there's no event, no interrupt assertion should come > >>>>>>> from the controller. > >>>>>>> > >>>>>>> If the pending_events is not set and you still see this failure, then > >>>>>>> likely that the controller had started, and the interrupt is generated > >>>>>>> from the controller event. This occurs along with the interrupt > >>>>>>> generated from your connection notification from your setup. > >>>>>> I completely agree. My discussion revolves around the handling of the > >>>>>> DWC3_EVENT_PENDING flag in all situations. The purpose of using this > >>>>>> flag is to prevent the processing of new events if an existing event is > >>>>>> still being processed. This flag is set in the top-half interrupt > >>>>>> handler and cleared at the end of the bottom-half handler. > >>>>>> > >>>>>> Now, let's consider scenarios where the bottom half is not scheduled, > >>>>>> and a USB reconnect occurs. In this case, there is a possibility that > >>>>>> the interrupt line is unmasked in dwc3_event_buffers_setup, and the USB > >>>>>> controller begins posting new events. The top-half interrupt handler > >>>>>> checks for the DWC3_EVENT_PENDING flag and returns IRQ_HANDLED without > >>>>>> processing any new events. However, the USB controller continues to post > >>>>>> interrupts until they are acknowledged. > >>>>>> > >>>>>> Please review the complete sequence once with DWC3_EVENT_PENDING flag. > >>>>>> > >>>>>> My proposal is to clear or reset the DWC3_EVENT_PENDING flag when > >>>>>> unmasking the interrupt line dwc3_event_buffers_setup, apart from > >>>>>> bottom-half handler. Clearing the DWC3_EVENT_PENDING flag in > >>>>>> dwc3_event_buffers_setup does not cause any harm, as we have implemented > >>>>>> a temporary workaround in our test setup to prevent IRQ storms. > >>>>>> > >>>>>> > >>>>>> > >>>>>> Working scenarios: > >>>>>> ================== > >>>>>> 1. Top-half handler: > >>>>>> a. if (evt->flags & DWC3_EVENT_PENDING) > >>>>>> return IRQ_HANDLED; > >>>>>> b. Set DWC3_EVENT_PENDING flag > >>>>>> c. Masking interrupt line > >>>>>> > >>>>>> 2. Bottom-half handler: > >>>>>> a. Un-masking interrupt line > >>>>>> b. Clear DWC3_EVENT_PENDING flag > >>>>>> > >>>>>> Failure scenarios: > >>>>>> ================== > >>>>>> 1. Top-half handler: > >>>>>> a. if (evt->flags & DWC3_EVENT_PENDING) > >>>>>> return IRQ_HANDLED; > >>>>>> b. Set DWC3_EVENT_PENDING flag > >>>>>> c. Masking interrupt line > >>>>> For DWC3_EVENT_PENDING flag to be set at this point (before we start the > >>>>> controller), that means that the GEVNTCOUNT was not 0 after > >>>>> soft-disconnect and that the pm_runtime_suspended() must be false. > >>>> In the top-half code where we set the DWC3_EVENT_PENDING flag, we > >>>> acknowledge GEVNTCOUNT. Therefore, I think it is not necessary for > >>>> GEVNTCOUNT to have a non-zero value until a new event occurs. In fact, > >>>> when we tried to print GEVNTCOUNT in a non-interrupt context, we found > >>>> that it was zero, where we received DWC3_EVENT_PENDING being set in > >>>> non-interrupt context. > >>> For DWC3_EVENT_PENDING to be set, GEVNTCOUNT must be non-zero. If you > >>> see it's zero, that means that it was already decremented by the driver. > >>> > >>> If the driver acknowledges the GEVNTCOUNT, then that means that the > >>> events are copied and prepared to be processed. The bottom-half thread > >>> is scheduled. If it's for stale event, I don't want it to be processed. > >>> > >>>>>> 2. No Bottom-half scheduled: > >>>>> Why is the bottom-half not scheduled? Or do you mean it hasn't woken up > >>>>> yet before the next top-half coming? > >>>> In very rare cases, it is possible in our platform that the CPU may not > >>>> be able to schedule the bottom half of the dwc3 interrupt because a work > >>>> queue lockup has occurred on the same CPU that is attempting to schedule > >>>> the dwc3 thread interrupt. In this case Yes, the bottom-half handler > >>>> hasn't woken up, then initiate an IRQ storm for new events after the > >>>> controller restarts, resulting in no more bottom-half scheduling due to > >>>> the CPU being stuck in processing continuous interrupts and return > >>>> IRQ_HANDLED by checking if (evt->flags & DWC3_EVENT_PENDING). > >>>> > >>>>>> 3. USB reconnect: dwc3_event_buffers_setup > >>>>>> a. Un-masking interrupt line > >>>>> Do we know that the GEVNTCOUNT is non-zero before starting the > >>>>> controller again? > >>>> The GEVNTCOUNT value showing as zero that we confirmed by adding debug > >>>> message here. > >>>>>> 4. Continuous interrupts : Top-half handler: > >>>>>> a. if (evt->flags & DWC3_EVENT_PENDING) > >>>>>> return IRQ_HANDLED; > >>>>>> > >>>>>> a. if (evt->flags & DWC3_EVENT_PENDING) > >>>>>> return IRQ_HANDLED; > >>>>>> > >>>>>> a. if (evt->flags & DWC3_EVENT_PENDING) > >>>>>> return IRQ_HANDLED; > >>>>>> ..... > >>>>>> > >>>>>> ..... > >>>>>> > >>>>>> ..... > >>>>>> > >>>> Sure, I can try implementing the proposed code modifications in our > >>>> testing environment. > >>>> > >>>> But, I am uncertain about how these changes will effectively prevent an > >>>> IRQ storm when the USB controller sequence restarts with the > >>>> DWC3_EVENT_PENDING. The following code will only execute until the > >>>> DWC3_EVENT_PENDING is cleared, at which point the previous bottom-half > >>>> will not be scheduled. > >>>> > >>>> Please correct me if i am wrong in my above understanding. > >>> As I mentioned, I don't want DWC3_EVENT_PENDING flag to be set due to > >>> the stale event. I want to ignore and skip processing any stale event. > >>> > >>> The DWC3_EVENT_PENDING should not be set by the time > >>> dwc3_event_buffers_setup() is called. > >>> > >>> Specifically review this condition in my testing patch: > >>> > >>> /* > >>> * If the controller is halted, the event count is stale/invalid. Ignore > >>> * them. This happens if the interrupt assertion is from an out-of-band > >>> * resume notification. > >>> */ > >>> if (!dwc->pullups_connected && count) { > >>> dwc3_writel(dwc->regs, DWC3_GEVNTCOUNT(0), count); > >>> return IRQ_HANDLED; > >>> } > >>> > >>> Let me know if the condition matches with what's happening for your > >>> case. > >> Hi Thinh, > >> > >> Thanks for your continuous reviews and suggestions. > >> > >> The given condition also will not matches in our case. > >> As i mentioned in starting of this thread please refer once the below > >> link of older discussion for similar issue from Samsung.. > >> > >> https://urldefense.com/v3/__https://lore.kernel.org/linux-usb/20230102050831.105499-1-jh0801.jung@xxxxxxxxxxx/__;!!A4F2R9G_pg!a3VpPHvMr9enk0YPjSoWJ12Kr5Hw2Ka43Q_wi80lw6ty2tJT4hKRKsCnQNdqbVS3JORK2VwqdoXDWz1q8ynpe7Ex6cU$ > >> > >> > >> DWC3_EVENT_PENDING flags set when count is 0. > >> It means "There are no interrupts to handle.". > >> > >> (struct dwc3_event_buffer *) ev_buf = 0xFFFFFF883DBF1180 ( > >> (void *) buf = 0xFFFFFFC00DBDD000 = end+0x337D000, > >> (void *) cache = 0xFFFFFF8839F54080, > >> (unsigned int) length = 0x1000, > >> (unsigned int) lpos = 0x0, > >> *(unsigned int) count = 0x0, (unsigned int) flags = 0x00000001,* > >> (dma_addr_t) dma = 0x00000008BD7D7000, > >> (struct dwc3 *) dwc = 0xFFFFFF8839CBC880, > >> (u64) android_kabi_reserved1 = 0x0), > > > > This is the info of the event buffer that was reset after the > > dwc3_event_buffers_setup(). I'm talking about the first time > > DWC3_EVENT_PENDING flag was set. > > Yes, the buffer that was reset before as part of > dwc3_event_buffers_setup() is correct. > I agree on your new code changes in below will prevent setting > DWC3_EVENT_PENDING and avoid the bottom-half handler if the controller > is halted, and the event count is invalid. > > Are you suspecting that the DWC3_EVENT_PENDING flag was only set in this > scenario in our failure case? Base on the discussion so far, that's what I'm suspecting. > > /*diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c > index 89fc690fdf34..a525f7ea5886 100644 > --- a/drivers/usb/dwc3/gadget.c > +++ b/drivers/usb/dwc3/gadget.c > @@ -4490,6 +4490,17 @@ static irqreturn_t dwc3_check_event_buf(struct dwc3_event_buffer *evt) > return IRQ_HANDLED; > > count = dwc3_readl(dwc->regs, DWC3_GEVNTCOUNT(0)); > + > + /* > + * If the controller is halted, the event count is stale/invalid. Ignore > + * them. This happens if the interrupt assertion is from an out-of-band > + * resume notification. > + */ > + if (!dwc->pullups_connected && count) { > + dwc3_writel(dwc->regs, DWC3_GEVNTCOUNT(0), count); > + return IRQ_HANDLED; > + } > + > count &= DWC3_GEVNTCOUNT_MASK; > if (!count) > return IRQ_NONE; > > > > > By the time the interrupt storm below occur, the count in the buffer is > > already zero'ed out. > > > >> IRQ Storm: > >> (time = 47557628930999, irq = 165, fn = dwc3_interrupt, latency = 0, en = 1), > >> (time = 47557628931268, irq = 165, fn = dwc3_interrupt, latency = 0, en = 3), > >> (time = 47557628932383, irq = 165, fn = dwc3_interrupt, latency = 0, en = 1), > >> (time = 47557628932652, irq = 165, fn = dwc3_interrupt, latency = 0, en = 3), > >> (time = 47557628933768, irq = 165, fn = dwc3_interrupt, latency = 0, en = 1), > >> (time = 47557628934037, irq = 165, fn = dwc3_interrupt, latency = 0, en = 3), > >> ... > >> ... > >> ... > >> > >> > >> We are also fine with below code changes as you suggested earlier. > >> https://urldefense.com/v3/__https://lore.kernel.org/linux-usb/20230109190914.3blihjfjdcszazdd@xxxxxxxxxxxx/__;!!A4F2R9G_pg!a3VpPHvMr9enk0YPjSoWJ12Kr5Hw2Ka43Q_wi80lw6ty2tJT4hKRKsCnQNdqbVS3JORK2VwqdoXDWz1q8ynp367zvEw$ > >> > >> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c > >> index 65500246323b..3c36dfdb88f0 100644 > >> --- a/drivers/usb/dwc3/gadget.c > >> +++ b/drivers/usb/dwc3/gadget.c > >> @@ -5515,8 +5515,15 @@ static irqreturn_t dwc3_check_event_buf(struct > >> dwc3_event_buffer *evt) > >> * irq event handler completes before caching new event to prevent > >> * losing events. > >> */ > >> - if (evt->flags & DWC3_EVENT_PENDING) > >> + if (evt->flags & DWC3_EVENT_PENDING) { > >> + if (!evt->count) { > >> + u32 reg = dwc3_readl(dwc->regs, DWC3_GEVNTSIZ(0)); > >> + > >> + if (!(reg & DWC3_GEVNTSIZ_INTMASK)) > >> + evt->flags &= ~DWC3_EVENT_PENDING; > >> + } > >> return IRQ_HANDLED; > >> + } > >> > >> > > I don't want the bottom-half to be scheduled in the beginning as it may > > come before the cleanup in dwc3_event_buffers_setup(). > You mean the above changes for clearing DWC3_EVENT_PENDINGnot required > as you given new change will prevent setting of DWC3_EVENT_PENDING > before dwc3_event_buffers_setup(). Yes. > But I dont see any harm in above code changes for clearing > DWC3_EVENT_PENDING if it already set with evt->count=0. You can add it there, but do we need to if we can solve the actual issue? I'm interested in confirming my suspiction of what's really causing the DWC3_EVENT_PENDING to be set here. The code logic would be clearer rather than masking the behavior by depending on the reset by the dwc3_event_buffers_setup(). The runtime resume doesn't share the same locking mechanism as when processing an event. While it may be unlikely, I don't want the bottom-half thread to handle stale event or race with the runtime resume. > > Anyway I will try the your new proposed changes alone on our testing > setup and will update the status, > Thanks, Thinh