Re: [PATCH 1/3] usb: dwc3: gadget: Prevent losing events in event cache

John Youn <John.Youn@xxxxxxxxxxxx> · Tue, 11 Apr 2017 06:33:17 +0000

On 04/10/2017 11:17 PM, Janusz Dziedzic wrote:
> On 10 April 2017 at 20:43, John Youn <John.Youn@xxxxxxxxxxxx> wrote:
>> On 04/08/2017 12:38 PM, Janusz Dziedzic wrote:
>>> 2017-04-08 1:57 GMT+02:00 Thinh Nguyen <Thinh.Nguyen@xxxxxxxxxxxx>:
>>>> The dwc3 driver can overwite its previous events if its top half IRQ
>>>> handler gets invoked again before processing the events in the cache. We
>>>> see this as a hang in the file transfer and the host will attempt to
>>>> reset the device. This shouldn't be possible in the dwc3 implementation
>>>> if the HW and the SW are in sync. However, through testing and reading
>>>> the pcie trace, the top half IRQ handler occasionally still gets invoked
>>>> one more time after HW interrupt deassertion. We suspect that there is a
>>>> small detection delay in the SW.
>>>>
>>>> Top half IRQ handler deasserts the interrupt line after it gets the
>>>> event count by writing DWC3_GEVNTSIZ_INTMASK to DWC3_GEVNTSIZ. There's a
>>>> small window for a new event coming in between reading the event count
>>>> and the interrupt deassertion. More generally, we will see 0 event
>>>> count, which should not affect anything.
>>>>
>>>> To avoid this issue, we ensure that the events in the cache are
>>>> processed before checking for the new ones. The bottom half IRQ will
>>>> unmask the DWC3_GEVNTSIZ_INTMASK once it finishes processing the events.
>>>> Top half IRQ handler needs to check whether the DWC3_GEVNTSIZ_INTMASK is
>>>> still set before storing the new events. It also should return
>>>> IRQ_HANDLED if the mask is still set since IRQ handler was invoked for
>>>> our usb interrupt.
>>>>
>>>> Signed-off-by: Thinh Nguyen <thinhn@xxxxxxxxxxxx>
>>>> ---
>>>>  drivers/usb/dwc3/gadget.c | 5 ++++-
>>>>  1 file changed, 4 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
>>>> index 4dc80729ae11..93d98fb7215e 100644
>>>> --- a/drivers/usb/dwc3/gadget.c
>>>> +++ b/drivers/usb/dwc3/gadget.c
>>>> @@ -3045,6 +3045,10 @@ static irqreturn_t dwc3_check_event_buf(struct dwc3_event_buffer *evt)
>>>>                 return IRQ_HANDLED;
>>>>         }
>>>>
>>>> +       reg = dwc3_readl(dwc->regs, DWC3_GEVNTSIZ(0));
>>>> +       if (reg & DWC3_GEVNTSIZ_INTMASK)
>>>> +               return IRQ_HANDLED;
>>>> +
>>>>         count = dwc3_readl(dwc->regs, DWC3_GEVNTCOUNT(0));
>>>>         count &= DWC3_GEVNTCOUNT_MASK;
>>>>         if (!count)
>>>> @@ -3054,7 +3058,6 @@ static irqreturn_t dwc3_check_event_buf(struct dwc3_event_buffer *evt)
>>>>         evt->flags |= DWC3_EVENT_PENDING;
>>>>
>>>>         /* Mask interrupt */
>>>> -       reg = dwc3_readl(dwc->regs, DWC3_GEVNTSIZ(0));
>>>>         reg |= DWC3_GEVNTSIZ_INTMASK;
>>>>         dwc3_writel(dwc->regs, DWC3_GEVNTSIZ(0), reg);
>>>>
>>>> --
>>>> 2.11.0
>>>>
>>> This seems like a workaround, while when we mask interrupts we should
>>> not get IRQ before BH will done. Current driver implementation base on
>>> this.
>>> We are using 2.60a and didn't see problem you describe. Is it specyfic
>>> for some HW revision?
>>
>> Doesn't seem to be. We can try other HW versions.
>>
>>>
>>> I can imagine we can have the "last" interrupt and you will return
>>> IRQ_HANDLED. Will HW issue this interrupt once again? If not will we
>>> loose this IRQ (eg. disconnect event)?
>>>
>>
>> Yes it will re-assert. The interrupt line will remain asserted as long
>> events remain in the buffer and it is not masked. So when we
>> eventually unmask, the interrupt will be reasserted again immediately.
>>
>>> BTW, other option here could be using (like Baolin propose some time
>>> ago) dwc->lock in top half while this is held for BH.
>>> Question how long spin_lock() will be held in top half...
>>> I am not sure what option is the best.
>>
>> That won't help in this case since you can still have two calls top
>> top-half before a call to bottom. The lock only prevents the two from
>> stepping on each other, but that should already be the case without
>> needing the lock.
>>
> Really can we have two TH calls before BH?
> Interesting case :)

That's what we seem to be seeing. I'm not sure if it is normal or not.

>
> You suggest we have something like this:
> dwc3 IRQ
>    kernel irq_mask
>       dwc3 TH
>          mask dwc3 interrupts
>          get evt->count, evt->cache
>          write to hw (count)
>          return WAKE_THREAD
>    kernel irq_unmask
> a) Before BH start we get another dwc3 IRQ?
> b) CPU0 starts with BH while we get dwc3 IRQ on CPU1?
>
> *) in normal case BH should start and in the end unmask
> dwc3_interrupts and after that we could get new irq
>
> If this could happen then for sure you need dwc->lock protection in TH
> while base on your description CPU0 can execute BH and CPU1 can handle
> TH - so you have to protect dwc3_event_buffer.
>
> Could you describe this more? How to reproduce?

We can reproduce by running a file transfer continuously. It fails
fairly reliably, but it can take a while to hit the condition. We are
using our PCIe based HAPS FPGA on x86_64 using mainline kernel.

Thinh can provide which IP versions we tried with.

> Do you think we introduce this when adding evt->cache in TH?
> Even without evt->cache we still could overwrite evt->count - so, was
> that seen before evt->cache?

The multiple TH calls could have happened even before we introduced
evt->cache, but only with the cache would it have caused a failure due
to overwriting the cached events on multiple calls.

Regards,
John
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html