Re: [PATCH 1/3] usb: dwc3: gadget: Prevent losing events in event cache

John Youn <John.Youn@xxxxxxxxxxxx> · Tue, 11 Apr 2017 16:24:23 +0000

On 04/10/2017 11:58 PM, Felipe Balbi wrote:
>
> Hi,
>
> John Youn <John.Youn@xxxxxxxxxxxx> writes:
>>>> Yes it will re-assert. The interrupt line will remain asserted as long
>>>> events remain in the buffer and it is not masked. So when we
>>>> eventually unmask, the interrupt will be reasserted again immediately.
>>>>
>>>>> BTW, other option here could be using (like Baolin propose some time
>>>>> ago) dwc->lock in top half while this is held for BH.
>>>>> Question how long spin_lock() will be held in top half...
>>>>> I am not sure what option is the best.
>>>>
>>>> That won't help in this case since you can still have two calls top
>>>> top-half before a call to bottom. The lock only prevents the two from
>>>> stepping on each other, but that should already be the case without
>>>> needing the lock.
>>>>
>>> Really can we have two TH calls before BH?
>>> Interesting case :)
>>
>> That's what we seem to be seeing. I'm not sure if it is normal or not.
>
> no, that's not normal. If this is happening, there's either a HW bug
> somewhere, or we're not clearing events properly.
>
> Can you provide tracepoints of the failure? Perhaps add a trace_printk()
> call on both BH and TH just so we see when they're both called.
>
>>> You suggest we have something like this:
>>> dwc3 IRQ
>>>    kernel irq_mask
>>>       dwc3 TH
>>>          mask dwc3 interrupts
>>>          get evt->count, evt->cache
>>>          write to hw (count)
>>>          return WAKE_THREAD
>>>    kernel irq_unmask
>>> a) Before BH start we get another dwc3 IRQ?
>>> b) CPU0 starts with BH while we get dwc3 IRQ on CPU1?
>>>
>>> *) in normal case BH should start and in the end unmask
>>> dwc3_interrupts and after that we could get new irq
>>>
>>> If this could happen then for sure you need dwc->lock protection in TH
>>> while base on your description CPU0 can execute BH and CPU1 can handle
>>> TH - so you have to protect dwc3_event_buffer.
>>>
>>> Could you describe this more? How to reproduce?
>>
>> We can reproduce by running a file transfer continuously. It fails
>> fairly reliably, but it can take a while to hit the condition. We are
>> using our PCIe based HAPS FPGA on x86_64 using mainline kernel.
>
> Okay, please provide tracepoints of the problem in question.
>
>> Thinh can provide which IP versions we tried with.
>
> Thank you
>
>>> Do you think we introduce this when adding evt->cache in TH?
>>> Even without evt->cache we still could overwrite evt->count - so, was
>>> that seen before evt->cache?
>>
>> The multiple TH calls could have happened even before we introduced
>> evt->cache, but only with the cache would it have caused a failure due
>> to overwriting the cached events on multiple calls.
>
> Right, multiple TH calls should NEVER happen, because our IRQ line is
> masked. Unless, and this is a long shot, IRQ line is shared and ANOTHER
> device is causing IRQ to be raised. Can you show the output of:

We thought about this and ensured that there is no sharing of the IRQ.
We still see the dual calls to the top-half.

>
> # grep dwc3 /proc/interrupts
>
> If another device raises the interrupt, then we will get into our TH,
> however we should just return IRQ_HANDLED in that case because we
> shouldn't be generating events.

No, we will still be generating events. The masking of the interrupt
just deasserts the interrupt line. Events are still written out as
usual.

>
> Hmm, can you apply this patch and provide output when the failure
> happens? I suggest setting up a 100MiB trace buffer. You don't need to
> enable any tracers nor any event. trace_printk() is always enabled.

Sure we can do that.

Regards,
John

>
> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
> index 6f6f0b3be3ad..3b8185d81f9f 100644
> --- a/drivers/usb/dwc3/gadget.c
> +++ b/drivers/usb/dwc3/gadget.c
> @@ -3005,6 +3005,7 @@ static irqreturn_t dwc3_thread_interrupt(int irq, void *_evt)
>  	unsigned long flags;
>  	irqreturn_t ret = IRQ_NONE;
>
> +	trace_printk("BH\n");
>  	spin_lock_irqsave(&dwc->lock, flags);
>  	ret = dwc3_process_event_buf(evt);
>  	spin_unlock_irqrestore(&dwc->lock, flags);
> @@ -3019,6 +3020,8 @@ static irqreturn_t dwc3_check_event_buf(struct dwc3_event_buffer *evt)
>  	u32 count;
>  	u32 reg;
>
> +	trace_printk("evt->flags %08x\n", evt->flags);
> +
>  	if (pm_runtime_suspended(dwc->dev)) {
>  		pm_runtime_get(dwc->dev);
>  		disable_irq_nosync(dwc->irq_gadget);
> @@ -3027,7 +3030,9 @@ static irqreturn_t dwc3_check_event_buf(struct dwc3_event_buffer *evt)
>  	}
>
>  	count = dwc3_readl(dwc->regs, DWC3_GEVNTCOUNT(0));
> +	trace_printk("GEVNTCOUNT %08x\n", count);
>  	count &= DWC3_GEVNTCOUNT_MASK;
> +	trace_printk("%u events pending\n", count);
>  	if (!count)
>  		return IRQ_NONE;
>
> @@ -3036,10 +3041,12 @@ static irqreturn_t dwc3_check_event_buf(struct dwc3_event_buffer *evt)
>
>  	/* Mask interrupt */
>  	reg = dwc3_readl(dwc->regs, DWC3_GEVNTSIZ(0));
> +	trace_printk("GEVNTSIZ %08x\n", reg);
>  	reg |= DWC3_GEVNTSIZ_INTMASK;
>  	dwc3_writel(dwc->regs, DWC3_GEVNTSIZ(0), reg);
>
>  	amount = min(count, evt->length - evt->lpos);
> +	trace_printk("copying events to cache\n");
>  	memcpy(evt->cache + evt->lpos, evt->buf + evt->lpos, amount);
>
>  	if (amount < count)
> @@ -3053,7 +3060,7 @@ static irqreturn_t dwc3_check_event_buf(struct dwc3_event_buffer *evt)
>  static irqreturn_t dwc3_interrupt(int irq, void *_evt)
>  {
>  	struct dwc3_event_buffer	*evt = _evt;
> -
> +	trace_printk("TH\n");
>  	return dwc3_check_event_buf(evt);
>  }
>
>
>

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html