On 04/10/2017 11:58 PM, Felipe Balbi wrote: > > Hi, > > John Youn <John.Youn@xxxxxxxxxxxx> writes: >>>> Yes it will re-assert. The interrupt line will remain asserted as long >>>> events remain in the buffer and it is not masked. So when we >>>> eventually unmask, the interrupt will be reasserted again immediately. >>>> >>>>> BTW, other option here could be using (like Baolin propose some time >>>>> ago) dwc->lock in top half while this is held for BH. >>>>> Question how long spin_lock() will be held in top half... >>>>> I am not sure what option is the best. >>>> >>>> That won't help in this case since you can still have two calls top >>>> top-half before a call to bottom. The lock only prevents the two from >>>> stepping on each other, but that should already be the case without >>>> needing the lock. >>>> >>> Really can we have two TH calls before BH? >>> Interesting case :) >> >> That's what we seem to be seeing. I'm not sure if it is normal or not. > > no, that's not normal. If this is happening, there's either a HW bug > somewhere, or we're not clearing events properly. > > Can you provide tracepoints of the failure? Perhaps add a trace_printk() > call on both BH and TH just so we see when they're both called. > >>> You suggest we have something like this: >>> dwc3 IRQ >>> kernel irq_mask >>> dwc3 TH >>> mask dwc3 interrupts >>> get evt->count, evt->cache >>> write to hw (count) >>> return WAKE_THREAD >>> kernel irq_unmask >>> a) Before BH start we get another dwc3 IRQ? >>> b) CPU0 starts with BH while we get dwc3 IRQ on CPU1? >>> >>> *) in normal case BH should start and in the end unmask >>> dwc3_interrupts and after that we could get new irq >>> >>> If this could happen then for sure you need dwc->lock protection in TH >>> while base on your description CPU0 can execute BH and CPU1 can handle >>> TH - so you have to protect dwc3_event_buffer. >>> >>> Could you describe this more? How to reproduce? >> >> We can reproduce by running a file transfer continuously. It fails >> fairly reliably, but it can take a while to hit the condition. We are >> using our PCIe based HAPS FPGA on x86_64 using mainline kernel. > > Okay, please provide tracepoints of the problem in question. > >> Thinh can provide which IP versions we tried with. > > Thank you > >>> Do you think we introduce this when adding evt->cache in TH? >>> Even without evt->cache we still could overwrite evt->count - so, was >>> that seen before evt->cache? >> >> The multiple TH calls could have happened even before we introduced >> evt->cache, but only with the cache would it have caused a failure due >> to overwriting the cached events on multiple calls. > > Right, multiple TH calls should NEVER happen, because our IRQ line is > masked. Unless, and this is a long shot, IRQ line is shared and ANOTHER > device is causing IRQ to be raised. Can you show the output of: We thought about this and ensured that there is no sharing of the IRQ. We still see the dual calls to the top-half. > > # grep dwc3 /proc/interrupts > > If another device raises the interrupt, then we will get into our TH, > however we should just return IRQ_HANDLED in that case because we > shouldn't be generating events. No, we will still be generating events. The masking of the interrupt just deasserts the interrupt line. Events are still written out as usual. > > Hmm, can you apply this patch and provide output when the failure > happens? I suggest setting up a 100MiB trace buffer. You don't need to > enable any tracers nor any event. trace_printk() is always enabled. Sure we can do that. Regards, John > > diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c > index 6f6f0b3be3ad..3b8185d81f9f 100644 > --- a/drivers/usb/dwc3/gadget.c > +++ b/drivers/usb/dwc3/gadget.c > @@ -3005,6 +3005,7 @@ static irqreturn_t dwc3_thread_interrupt(int irq, void *_evt) > unsigned long flags; > irqreturn_t ret = IRQ_NONE; > > + trace_printk("BH\n"); > spin_lock_irqsave(&dwc->lock, flags); > ret = dwc3_process_event_buf(evt); > spin_unlock_irqrestore(&dwc->lock, flags); > @@ -3019,6 +3020,8 @@ static irqreturn_t dwc3_check_event_buf(struct dwc3_event_buffer *evt) > u32 count; > u32 reg; > > + trace_printk("evt->flags %08x\n", evt->flags); > + > if (pm_runtime_suspended(dwc->dev)) { > pm_runtime_get(dwc->dev); > disable_irq_nosync(dwc->irq_gadget); > @@ -3027,7 +3030,9 @@ static irqreturn_t dwc3_check_event_buf(struct dwc3_event_buffer *evt) > } > > count = dwc3_readl(dwc->regs, DWC3_GEVNTCOUNT(0)); > + trace_printk("GEVNTCOUNT %08x\n", count); > count &= DWC3_GEVNTCOUNT_MASK; > + trace_printk("%u events pending\n", count); > if (!count) > return IRQ_NONE; > > @@ -3036,10 +3041,12 @@ static irqreturn_t dwc3_check_event_buf(struct dwc3_event_buffer *evt) > > /* Mask interrupt */ > reg = dwc3_readl(dwc->regs, DWC3_GEVNTSIZ(0)); > + trace_printk("GEVNTSIZ %08x\n", reg); > reg |= DWC3_GEVNTSIZ_INTMASK; > dwc3_writel(dwc->regs, DWC3_GEVNTSIZ(0), reg); > > amount = min(count, evt->length - evt->lpos); > + trace_printk("copying events to cache\n"); > memcpy(evt->cache + evt->lpos, evt->buf + evt->lpos, amount); > > if (amount < count) > @@ -3053,7 +3060,7 @@ static irqreturn_t dwc3_check_event_buf(struct dwc3_event_buffer *evt) > static irqreturn_t dwc3_interrupt(int irq, void *_evt) > { > struct dwc3_event_buffer *evt = _evt; > - > + trace_printk("TH\n"); > return dwc3_check_event_buf(evt); > } > > > -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html