On 04/11/2017 11:14 PM, Felipe Balbi wrote: > > Hi, > > John Youn <John.Youn@xxxxxxxxxxxx> writes: >>>> John Youn <John.Youn@xxxxxxxxxxxx> writes: >>>>>> Thinh Nguyen <Thinh.Nguyen@xxxxxxxxxxxx> writes: >>>>>>> The dwc3 driver can overwite its previous events if its top half IRQ >>>>>>> handler gets invoked again before processing the events in the cache. We >>>>>> >>>>>> interrupts are masked, why would top half get invoked again? Is this, >>>>>> perhaps, related to DWC3 3.00a which has the "Interrupt line doesn't >>>>>> lower when masked" problem? We've added a lot of code to workaround that >>>>>> problem and, apparently, it wasn't enough. >>>>> >>>>> No, it is not related to that. We verified with PCIe traces. The >>>>> interrupt line gets deasserted after we mask it. And we put the >>>>> masking as close to the beginning of the top-half as possible. >>>>> >>>>>> >>>>>> In any case, there's no way top half would be invoked again in a >>>>>> properly working DWC3. >>>>> >>>>> Yet we still see it sometimes. Usually it doesn't create a problem, >>>> >>>> that's fair, but it's not for the reason you're describing :-) There >>>> might be another problem going on, because since we masked the interrupt >>>> and cleared all events, IRQ shouldn't be raised at all; unless, as I >>>> mentioned on the other subthread, the IRQ line is shared. >>>> >>>>> but if there happens to be a new event there, we get the failure. >>>>> >>>>> We didn't trace into that part of the kernel so we can't explain why. >>>>> But if there is any chance the interrupt line deassertion wasn't >>>>> detected in time, whatever part of the kernel that thinks it is still >>>>> asserted might just call our top-half again. This could be a totally >>>>> wrong assumption, but it doesn't seem too far-fetched. >>>> >>>> The kernel doesn't detect IRQ line assertion/deassertion. CPU gets an >>>> exception when that happens and calls Kernel IRQ handler vector. That >>>> will, in turn, figure out which line triggered, call the handler and so >>>> on. >>> >>> We're talking about PCIe though, where interrupt assertion and >>> deassertion are packets. So I would imagine the kernel has to do >>> something and there could be some latency associated with that. >> >> Also, another thing is that the device uses legacy, level-triggered, >> PCIe interrupts, so for as long as the interrupt is asserted, the TH >> is called repeatedly. > > yes, and that's why we have: > >> static irqreturn_t dwc3_check_event_buf(struct dwc3_event_buffer *evt) >> { >> struct dwc3 *dwc = evt->dwc; >> u32 amount; >> u32 count; >> u32 reg; > >> if (pm_runtime_suspended(dwc->dev)) { >> pm_runtime_get(dwc->dev); >> disable_irq_nosync(dwc->irq_gadget); >> dwc->pending_events = true; >> return IRQ_HANDLED; >> } >> >> count = dwc3_readl(dwc->regs, DWC3_GEVNTCOUNT(0)); >> count &= DWC3_GEVNTCOUNT_MASK; > > check how many events are pending in the event buffer. > >> if (!count) >> return IRQ_NONE; >> >> evt->count = count; >> evt->flags |= DWC3_EVENT_PENDING; >> >> /* Mask interrupt */ >> reg = dwc3_readl(dwc->regs, DWC3_GEVNTSIZ(0)); >> reg |= DWC3_GEVNTSIZ_INTMASK; > > mask interrupt generation > >> dwc3_writel(dwc->regs, DWC3_GEVNTSIZ(0), reg); >> >> amount = min(count, evt->length - evt->lpos); >> memcpy(evt->cache + evt->lpos, evt->buf + evt->lpos, amount); >> >> if (amount < count) >> memcpy(evt->cache, evt->buf, count - amount); >> >> dwc3_writel(dwc->regs, DWC3_GEVNTCOUNT(0), count); > > clear ALL events from event buffer. This brings the line down, so we > shouldn't re-enter. Right, I get all that, but you're not addressing the fact that the "interrupts" are just packets on the PCIe and there could be some latency in detecting the de-assertion. The "line" doesn't go down immediately when we mask it. I don't see any obvious way to guarantee that it will be de-asserted by the time we exit TH. There is both an "assert" and "de-assert" packet. Between those, the TH gets called continuously. You can see multiple TH before BH behavior just by omitting the mask and leaving it asserted. > >> return IRQ_WAKE_THREAD; >> } > >> So we mask the interrupt in the TH and a short time later, the >> interrupt de-assertion packet is sent on PCIe bus and if that's not >> seen right away we may already have another call to TH before the BH >> gets scheduled. > > not sure this can happen. If that's the case, every PCI driver would > have all sorts of tricks to cope with this, not only dwc3 :-) Well, the DWC3 is now doing something destructive in the TH. Otherwise multiple TH calls wouldn't have affected anything. We should probably use a queue for the events instead. I think most drivers should be robust against this type of thing, as you never know when you can get an interrupt, and "spurious" interrupts are a thing. > > Bjorn, is this something that can happen on PCIe? > > Quick summary of the problem: > > John and Thinh are experiencing a re-entrant top-half handler even > though we have cleared pending IRQ status _and_ masked Interrupts. SNPS > is using an FPGA model of the latest DWC3 core under x86. > > I have never seen this behavior on ARM or any of the x86 devices > containing this core (and this includes all the newest x86 cores, see > drivers/usb/dwc3/dwc3-pci.c for PCI IDs if you care enough :-) > > Anyway, from my point of view, this is either a bug in IRQ subsystem > which only John and Thinh can reproduce at this moment, or a regression > with DWC3 IP Core :-s > I don't think it is necessarily either of those things. I think the fundamental behavior was always like this but adding the event cache and having a destructive TH, requiring a single TH before BH caused the issue to come up. We can try with various older IP versions tomorrow to try to confirm this. Regards, John -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html