On 04/10/2017 11:17 PM, Janusz Dziedzic wrote: > On 10 April 2017 at 20:43, John Youn <John.Youn@xxxxxxxxxxxx> wrote: >> On 04/08/2017 12:38 PM, Janusz Dziedzic wrote: >>> 2017-04-08 1:57 GMT+02:00 Thinh Nguyen <Thinh.Nguyen@xxxxxxxxxxxx>: >>>> The dwc3 driver can overwite its previous events if its top half IRQ >>>> handler gets invoked again before processing the events in the cache. We >>>> see this as a hang in the file transfer and the host will attempt to >>>> reset the device. This shouldn't be possible in the dwc3 implementation >>>> if the HW and the SW are in sync. However, through testing and reading >>>> the pcie trace, the top half IRQ handler occasionally still gets invoked >>>> one more time after HW interrupt deassertion. We suspect that there is a >>>> small detection delay in the SW. >>>> >>>> Top half IRQ handler deasserts the interrupt line after it gets the >>>> event count by writing DWC3_GEVNTSIZ_INTMASK to DWC3_GEVNTSIZ. There's a >>>> small window for a new event coming in between reading the event count >>>> and the interrupt deassertion. More generally, we will see 0 event >>>> count, which should not affect anything. >>>> >>>> To avoid this issue, we ensure that the events in the cache are >>>> processed before checking for the new ones. The bottom half IRQ will >>>> unmask the DWC3_GEVNTSIZ_INTMASK once it finishes processing the events. >>>> Top half IRQ handler needs to check whether the DWC3_GEVNTSIZ_INTMASK is >>>> still set before storing the new events. It also should return >>>> IRQ_HANDLED if the mask is still set since IRQ handler was invoked for >>>> our usb interrupt. >>>> >>>> Signed-off-by: Thinh Nguyen <thinhn@xxxxxxxxxxxx> >>>> --- >>>> drivers/usb/dwc3/gadget.c | 5 ++++- >>>> 1 file changed, 4 insertions(+), 1 deletion(-) >>>> >>>> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c >>>> index 4dc80729ae11..93d98fb7215e 100644 >>>> --- a/drivers/usb/dwc3/gadget.c >>>> +++ b/drivers/usb/dwc3/gadget.c >>>> @@ -3045,6 +3045,10 @@ static irqreturn_t dwc3_check_event_buf(struct dwc3_event_buffer *evt) >>>> return IRQ_HANDLED; >>>> } >>>> >>>> + reg = dwc3_readl(dwc->regs, DWC3_GEVNTSIZ(0)); >>>> + if (reg & DWC3_GEVNTSIZ_INTMASK) >>>> + return IRQ_HANDLED; >>>> + >>>> count = dwc3_readl(dwc->regs, DWC3_GEVNTCOUNT(0)); >>>> count &= DWC3_GEVNTCOUNT_MASK; >>>> if (!count) >>>> @@ -3054,7 +3058,6 @@ static irqreturn_t dwc3_check_event_buf(struct dwc3_event_buffer *evt) >>>> evt->flags |= DWC3_EVENT_PENDING; >>>> >>>> /* Mask interrupt */ >>>> - reg = dwc3_readl(dwc->regs, DWC3_GEVNTSIZ(0)); >>>> reg |= DWC3_GEVNTSIZ_INTMASK; >>>> dwc3_writel(dwc->regs, DWC3_GEVNTSIZ(0), reg); >>>> >>>> -- >>>> 2.11.0 >>>> >>> This seems like a workaround, while when we mask interrupts we should >>> not get IRQ before BH will done. Current driver implementation base on >>> this. >>> We are using 2.60a and didn't see problem you describe. Is it specyfic >>> for some HW revision? >> >> Doesn't seem to be. We can try other HW versions. >> >>> >>> I can imagine we can have the "last" interrupt and you will return >>> IRQ_HANDLED. Will HW issue this interrupt once again? If not will we >>> loose this IRQ (eg. disconnect event)? >>> >> >> Yes it will re-assert. The interrupt line will remain asserted as long >> events remain in the buffer and it is not masked. So when we >> eventually unmask, the interrupt will be reasserted again immediately. >> >>> BTW, other option here could be using (like Baolin propose some time >>> ago) dwc->lock in top half while this is held for BH. >>> Question how long spin_lock() will be held in top half... >>> I am not sure what option is the best. >> >> That won't help in this case since you can still have two calls top >> top-half before a call to bottom. The lock only prevents the two from >> stepping on each other, but that should already be the case without >> needing the lock. >> > Really can we have two TH calls before BH? > Interesting case :) That's what we seem to be seeing. I'm not sure if it is normal or not. > > You suggest we have something like this: > dwc3 IRQ > kernel irq_mask > dwc3 TH > mask dwc3 interrupts > get evt->count, evt->cache > write to hw (count) > return WAKE_THREAD > kernel irq_unmask > a) Before BH start we get another dwc3 IRQ? > b) CPU0 starts with BH while we get dwc3 IRQ on CPU1? > > *) in normal case BH should start and in the end unmask > dwc3_interrupts and after that we could get new irq > > If this could happen then for sure you need dwc->lock protection in TH > while base on your description CPU0 can execute BH and CPU1 can handle > TH - so you have to protect dwc3_event_buffer. > > Could you describe this more? How to reproduce? We can reproduce by running a file transfer continuously. It fails fairly reliably, but it can take a while to hit the condition. We are using our PCIe based HAPS FPGA on x86_64 using mainline kernel. Thinh can provide which IP versions we tried with. > Do you think we introduce this when adding evt->cache in TH? > Even without evt->cache we still could overwrite evt->count - so, was > that seen before evt->cache? The multiple TH calls could have happened even before we introduced evt->cache, but only with the cache would it have caused a failure due to overwriting the cached events on multiple calls. Regards, John -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html