Re: [PATCH 1/3] usb: dwc3: gadget: Prevent losing events in event cache

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 04/09/2017 11:59 PM, Felipe Balbi wrote:
>
> Hi,
>
> Thinh Nguyen <Thinh.Nguyen@xxxxxxxxxxxx> writes:
>> The dwc3 driver can overwite its previous events if its top half IRQ
>> handler gets invoked again before processing the events in the cache. We
>
> interrupts are masked, why would top half get invoked again? Is this,
> perhaps, related to DWC3 3.00a which has the "Interrupt line doesn't
> lower when masked" problem? We've added a lot of code to workaround that
> problem and, apparently, it wasn't enough.

No, it is not related to that. We verified with PCIe traces. The
interrupt line gets deasserted after we mask it. And we put the
masking as close to the beginning of the top-half as possible.

>
> In any case, there's no way top half would be invoked again in a
> properly working DWC3.

Yet we still see it sometimes. Usually it doesn't create a problem,
but if there happens to be a new event there, we get the failure.

We didn't trace into that part of the kernel so we can't explain why.
But if there is any chance the interrupt line deassertion wasn't
detected in time, whatever part of the kernel that thinks it is still
asserted might just call our top-half again. This could be a totally
wrong assumption, but it doesn't seem too far-fetched.

If you have any insight on how to debug this, we can investigate this
more.


>
>> see this as a hang in the file transfer and the host will attempt to
>> reset the device. This shouldn't be possible in the dwc3 implementation
>> if the HW and the SW are in sync. However, through testing and reading
>
> it's not about SW and HW being in sync, it's about the HW being quirky,
> right?

No, we don't see any problem in the hardware. Here, being out of
"sync" just means the device hangs because of lost events. There are a
number of pretty bad failure scenarios that could happen due to this.

>
>> the pcie trace, the top half IRQ handler occasionally still gets invoked
>> one more time after HW interrupt deassertion. We suspect that there is a
>> small detection delay in the SW.
>
> so you don't really know what's going on :-) Why send a patch if you
> don't understand the problem fully?
>
>> Top half IRQ handler deasserts the interrupt line after it gets the
>> event count by writing DWC3_GEVNTSIZ_INTMASK to DWC3_GEVNTSIZ. There's a
>> small window for a new event coming in between reading the event count
>> and the interrupt deassertion. More generally, we will see 0 event
>> count, which should not affect anything.
>
> this new event can come, and it'll remain in the event buffer. We have
> masked the IRQ line and this should be enough to make sure IRQ isn't
> generated again. Why is the IRQ raising again? Which kernel version are
> you testing? Which DWC3 revision? Do you have interrupt moderation?

It's not about the new event. We see the case where the top-half is
called again just after we mask the interrupt without any events too.
That's probably the majority of the time it occurs. It's just that if
this case happens to coincide with a new event, everything gets
screwed up.

An alternative fix is to not allow the top-half to overwrite the event
cache. Just append and update it. The top and bottom are already
mutually exclusive so it should be fine.

Regards,
John
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Media]     [Linux Input]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Old Linux USB Devel Archive]

  Powered by Linux