Re: [RFC PATCH 1/4] usb: xhci: Synopsys xHC consolidate TRBs

Mathias Nyman <mathias.nyman@xxxxxxxxxxxxxxx> · Wed, 12 Feb 2020 17:04:52 +0200

On 7.2.2020 19.17, Tejas Joglekar wrote:
> Hi,
> Thanks for the review comments.
> On 1/3/2020 10:14 PM, Mathias Nyman wrote:
>> On 20.12.2019 15.39, Tejas Joglekar wrote:
>>> The Synopsys xHC has an internal TRB cache of size TRB_CACHE_SIZE for
>>> each endpoint. The default value for TRB_CACHE_SIZE is 16 for SS and 8
>>> for HS. The controller loads and updates the TRB cache from the transfer
>>> ring in system memory whenever the driver issues a start transfer or
>>> update transfer command.
>>>
>>> For chained TRBs, the Synopsys xHC requires that the total amount of
>>> bytes for all TRBs loaded in the TRB cache be greater than or equal to 1
>>> MPS. Or the chain ends within the TRB cache (with a last TRB).
>>>
>>> If this requirement is not met, the controller will not be able to send
>>> or receive a packet and it will hang causing a driver timeout and error.
>>>
>>> This can be a problem if a class driver queues SG requests with many
>>> small-buffer entries. The XHCI driver will create a chained TRB for each
>>> entry which may trigger this issue.
>>>
>>> This patch adds logic to the XHCI driver to detect and prevent this from
>>> happening.
>>>
>>> For every (TRB_CACHE_SIZE - 2) TRBs, we check the total buffer size of
>>> the TRBs and if the chain continues and we don't make up at least 1 MPS,
>>> we create a bounce buffer to consolidate up to the next (4 * MPS) TRBs
>>> into the last TRB.
>>>
>>> We check at (TRB_CACHE_SIZE - 2) because it is possible that there would
>>> be a link and/or event data TRB that take up to 2 of the cache entries.
>>> And we consolidate the next (4 * MPS) to improve performance.
>>>
>>> We discovered this issue with devices on other platforms but have not
>>> yet come across any device that triggers this on Linux. But it could be
>>> a real problem now or in the future. All it takes is N number of small
>>> chained TRBs. And other instances of the Synopsys IP may have smaller
>>> values for the TRB_CACHE_SIZE which would exacerbate the problem.
>>>
>>> We verified this patch using our internal driver and testing framework.
>>
>>
> I understand that in a first look it looks a complex solution, but you can suggest
> the modifications/changes which would be required to make the patch more readable.
> I have tried to keep the implementation similar to bounce buffer implementation 
> only with addition of bounce buffer list. For the spinlock case, you can take a 
> call if it is required or not.

In your case you know the need for a bounce buffer much earlier than in the
existing TD fragment case.

Have you looked into the struct hc_driver map_urb_for_dma() and
unmap_urb_for_dma() hooks? In those you could detect the need for a bounce
buffer, allocate it, and bluntly copy entire scattergather buffer to the 
bounce buffer. It should be fairly small anyway.

-Mathias