Re: [RFC PATCH 1/4] usb: xhci: Synopsys xHC consolidate TRBs

Tejas Joglekar <Tejas.Joglekar@xxxxxxxxxxxx> · Fri, 14 Feb 2020 08:36:38 +0000



Hi,
On 2/12/2020 8:34 PM, Mathias Nyman wrote:
> On 7.2.2020 19.17, Tejas Joglekar wrote:
>> Hi,
>> Thanks for the review comments.
>> On 1/3/2020 10:14 PM, Mathias Nyman wrote:
>>> On 20.12.2019 15.39, Tejas Joglekar wrote:
>>>> The Synopsys xHC has an internal TRB cache of size TRB_CACHE_SIZE for
>>>> each endpoint. The default value for TRB_CACHE_SIZE is 16 for SS and 8
>>>> for HS. The controller loads and updates the TRB cache from the transfer
>>>> ring in system memory whenever the driver issues a start transfer or
>>>> update transfer command.
>>>>
>>>> For chained TRBs, the Synopsys xHC requires that the total amount of
>>>> bytes for all TRBs loaded in the TRB cache be greater than or equal to 1
>>>> MPS. Or the chain ends within the TRB cache (with a last TRB).
>>>>
>>>> If this requirement is not met, the controller will not be able to send
>>>> or receive a packet and it will hang causing a driver timeout and error.
>>>>
>>>> This can be a problem if a class driver queues SG requests with many
>>>> small-buffer entries. The XHCI driver will create a chained TRB for each
>>>> entry which may trigger this issue.
>>>>
>>>> This patch adds logic to the XHCI driver to detect and prevent this from
>>>> happening.
>>>>
>>>> For every (TRB_CACHE_SIZE - 2) TRBs, we check the total buffer size of
>>>> the TRBs and if the chain continues and we don't make up at least 1 MPS,
>>>> we create a bounce buffer to consolidate up to the next (4 * MPS) TRBs
>>>> into the last TRB.
>>>>
>>>> We check at (TRB_CACHE_SIZE - 2) because it is possible that there would
>>>> be a link and/or event data TRB that take up to 2 of the cache entries.
>>>> And we consolidate the next (4 * MPS) to improve performance.
>>>>
>>>> We discovered this issue with devices on other platforms but have not
>>>> yet come across any device that triggers this on Linux. But it could be
>>>> a real problem now or in the future. All it takes is N number of small
>>>> chained TRBs. And other instances of the Synopsys IP may have smaller
>>>> values for the TRB_CACHE_SIZE which would exacerbate the problem.
>>>>
>>>> We verified this patch using our internal driver and testing framework.
>>>
>>>
>> I understand that in a first look it looks a complex solution, but you can suggest
>> the modifications/changes which would be required to make the patch more readable.
>> I have tried to keep the implementation similar to bounce buffer implementation 
>> only with addition of bounce buffer list. For the spinlock case, you can take a 
>> call if it is required or not.
> 
> In your case you know the need for a bounce buffer much earlier than in the
> existing TD fragment case.
> 
> Have you looked into the struct hc_driver map_urb_for_dma() and
> unmap_urb_for_dma() hooks? In those you could detect the need for a bounce
> buffer, allocate it, and bluntly copy entire scattergather buffer to the 
> bounce buffer. It should be fairly small anyway.
>
I will look into it, and get back to you. Thanks for the suggestion.
 
> -Mathias
> 

Regards,
Tejas Joglekar