On 04/18/2017 05:47 AM, Felipe Balbi wrote: > > Hi, > > Thinh Nguyen <Thinh.Nguyen@xxxxxxxxxxxx> writes: >>> (Thinh, for whatever I didn't receive your email via the list, replying >>> to myself) >> >> Could it be because of the attached file? >> >>> >>> Felipe Balbi <balbi@xxxxxxxxxx> writes: >>>>> Hmm, can you apply this patch and provide output when the failure >>>>> happens? I suggest setting up a 100MiB trace buffer. You don't need to >>>>> enable any tracers nor any event. trace_printk() is always enabled. >>>> >>>> Sure. >>>> >>>> Attached is a 7z with the full kernel trace with the failure. >>> >>> Thank you :-) >>> >>>> More often than not, we will see something like below, where the TH >>>> coming in back to back with 0 event count on the second TH. >>>> >>>> 2990.496588: dwc3_interrupt: TH >>>> 2990.496588: dwc3_interrupt: evt->flags 00000000 >>>> 2990.496590: dwc3_readl: addr ffffc90001b2840c value 00000004 >>>> 2990.496590: dwc3_interrupt: GEVNTCOUNT 00000004 >>>> 2990.496590: dwc3_interrupt: 4 events pending >>>> 2990.496591: dwc3_readl: addr ffffc90001b28408 value 00001000 >>>> 2990.496591: dwc3_interrupt: GEVNTSIZ 00001000 >>>> 2990.496592: dwc3_writel: addr ffffc90001b28408 value 80001000 >>>> 2990.496592: dwc3_interrupt: copying events to cache >>>> 2990.496592: dwc3_writel: addr ffffc90001b2840c value 00000004 >>>> 2990.496614: dwc3_interrupt: TH >>>> 2990.496614: dwc3_interrupt: evt->flags 00000001 >>>> 2990.496615: dwc3_readl: addr ffffc90001b2840c value 00000000 >>>> 2990.496615: dwc3_interrupt: GEVNTCOUNT 00000000 >>>> 2990.496616: dwc3_interrupt: 0 events pending >>>> 2990.496616: dwc3_thread_interrupt: BH >>>> 2990.496616: dwc3_event: event (00006084): ep1out: Transfer In-Progress >>>> >>>> >>>> Here is the snippet of the attached trace where the TH is called twice >>>> with a new event coming in and overwrite the old one: >>>> >>>> 2990.601357: dwc3_readl: addr ffffc90001b2883c value 00030007 >>>> 2990.601357: dwc3_gadget_ep_cmd: ep1in: cmd 'Update Transfer' [196615].. >>>> 2990.601360: dwc3_interrupt: TH >>>> 2990.601360: dwc3_interrupt: evt->flags 00000000 >>>> 2990.601362: dwc3_readl: addr ffffc90001b2840c value 00000004 >>>> 2990.601362: dwc3_interrupt: GEVNTCOUNT 00000004 >>>> 2990.601362: dwc3_interrupt: 4 events pending >>>> 2990.601364: dwc3_readl: addr ffffc90001b28408 value 00001000 >>>> 2990.601364: dwc3_interrupt: GEVNTSIZ 00001000 >>>> 2990.601364: dwc3_writel: addr ffffc90001b28408 value 80001000 >>>> 2990.601364: dwc3_interrupt: copying events to cache >>>> 2990.601364: dwc3_writel: addr ffffc90001b2840c value 00000004 >>>> 2990.601369: dwc3_interrupt: TH >>>> 2990.601369: dwc3_interrupt: evt->flags 00000001 >>>> 2990.601369: dwc3_readl: addr ffffc90001b2840c value 00000004 >>>> 2990.601370: dwc3_interrupt: GEVNTCOUNT 00000004 >>>> 2990.601370: dwc3_interrupt: 4 events pending >>>> 2990.601372: dwc3_readl: addr ffffc90001b28408 value 80001000 >>>> 2990.601372: dwc3_interrupt: GEVNTSIZ 80001000 >>>> 2990.601372: dwc3_writel: addr ffffc90001b28408 value 80001000 >>>> 2990.601372: dwc3_interrupt: copying events to cache >>>> 2990.601372: dwc3_writel: addr ffffc90001b2840c value 00000004 >>>> 2990.601375: dwc3_thread_interrupt: BH >>>> 2990.601376: dwc3_event: event (00004086): ep1in: Transfer In-Progress.. >>>> >>>> The device hangs at the end of the log. >> >> I repeated the test with USB 3.0 IP version 2.90a and USB 3.1 IP version >> 1.60, and I found the same result where the TH is occasionally called >> twice back-to-back. >> >>> >>> Right, found this in the logs. So the only thing that I can think of >>> here, is that we need to flush posted writes. Does this help? >>> >>> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c >>> index 79e7a3480d51..6f06738273f2 100644 >>> --- a/drivers/usb/dwc3/gadget.c >>> +++ b/drivers/usb/dwc3/gadget.c >>> @@ -3099,6 +3099,10 @@ static irqreturn_t dwc3_check_event_buf(struct dwc3_event_buffer *evt) >>> >>> dwc3_writel(dwc->regs, DWC3_GEVNTCOUNT(0), count); >>> >>> + /* flush posted writes */ >>> + dwc3_readl(dwc->regs, DWC3_GEVNTCOUNT(0)); >>> + dwc3_readl(dwc->regs, DWC3_GEVNTSIZ(0)); >>> + >>> return IRQ_WAKE_THREAD; >>> } >>> >>> >> >> With the change above, I don't see the TH get called twice in in a row. >> However, I also tested with replacing the code above with a udelay(4), >> from reading the kernel trace, to take in account the latency of reading >> the registers. I also don't see TH get called twice in a row. >> >> The extra latency from reading the registers above can mask this issue. > > okay, so it is a problem with posted writes. Can you send above as a > formal patch? That's something we can backport to every stable > kernel. And the real reason here _IS_ caused by posted writes. There's > no spurious interrupt happening. > Hi Felipe, I only mentioned spurious interrupts as something the driver needs to account for anyways. They are not the cause of this issue. And I don't think the PCIe posted writes are either becuase of what I said before about the interrupts being packets and the latency in assertion and detection. But anyways, after thorough testing we don't see this in our system anymore with the read-back (probably due to the 4 us delay that's introduced). We can submit a patch for this for stable. However I don't think this is a sufficient fix for mainline, and dwc3 should be able to handle multiple interrupts before BH, whatever the reason for it. How do you want to handle mainline? Either Thinh's original patch, create a queue for the events, or some other way? We can work on this fix too if you want. Thanks, John -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html