Re: [RFC PATCH 1/4] usb: xhci: Synopsys xHC consolidate TRBs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 3.3.2020 12.24, Tejas Joglekar wrote:
> Hi,
> On 2/14/2020 2:06 PM, Tejas Joglekar wrote:
>> Hi,
>> On 2/12/2020 8:34 PM, Mathias Nyman wrote:
>>> On 7.2.2020 19.17, Tejas Joglekar wrote:
>>>> Hi,
>>>> Thanks for the review comments.
>>>> On 1/3/2020 10:14 PM, Mathias Nyman wrote:
>>>>> On 20.12.2019 15.39, Tejas Joglekar wrote:
>>>>>> The Synopsys xHC has an internal TRB cache of size TRB_CACHE_SIZE for
>>>>>> each endpoint. The default value for TRB_CACHE_SIZE is 16 for SS and 8
>>>>>> for HS. The controller loads and updates the TRB cache from the transfer
>>>>>> ring in system memory whenever the driver issues a start transfer or
>>>>>> update transfer command.
>>>>>>
>>>>>> For chained TRBs, the Synopsys xHC requires that the total amount of
>>>>>> bytes for all TRBs loaded in the TRB cache be greater than or equal to 1
>>>>>> MPS. Or the chain ends within the TRB cache (with a last TRB).
>>>>>>
>>>>>> If this requirement is not met, the controller will not be able to send
>>>>>> or receive a packet and it will hang causing a driver timeout and error.
>>>>>>
>>>>>> This can be a problem if a class driver queues SG requests with many
>>>>>> small-buffer entries. The XHCI driver will create a chained TRB for each
>>>>>> entry which may trigger this issue.
>>>>>>
>>>>>> This patch adds logic to the XHCI driver to detect and prevent this from
>>>>>> happening.
>>>>>>
>>>>>> For every (TRB_CACHE_SIZE - 2) TRBs, we check the total buffer size of
>>>>>> the TRBs and if the chain continues and we don't make up at least 1 MPS,
>>>>>> we create a bounce buffer to consolidate up to the next (4 * MPS) TRBs
>>>>>> into the last TRB.
>>>>>>
>>>>>> We check at (TRB_CACHE_SIZE - 2) because it is possible that there would
>>>>>> be a link and/or event data TRB that take up to 2 of the cache entries.
>>>>>> And we consolidate the next (4 * MPS) to improve performance.
>>>>>>
>>>>>> We discovered this issue with devices on other platforms but have not
>>>>>> yet come across any device that triggers this on Linux. But it could be
>>>>>> a real problem now or in the future. All it takes is N number of small
>>>>>> chained TRBs. And other instances of the Synopsys IP may have smaller
>>>>>> values for the TRB_CACHE_SIZE which would exacerbate the problem.
>>>>>>
>>>>>> We verified this patch using our internal driver and testing framework.
>>>>>
>>>>>
>>>> I understand that in a first look it looks a complex solution, but you can suggest
>>>> the modifications/changes which would be required to make the patch more readable.
>>>> I have tried to keep the implementation similar to bounce buffer implementation 
>>>> only with addition of bounce buffer list. For the spinlock case, you can take a 
>>>> call if it is required or not.
>>>
>>> In your case you know the need for a bounce buffer much earlier than in the
>>> existing TD fragment case.
>>>
>>> Have you looked into the struct hc_driver map_urb_for_dma() and
>>> unmap_urb_for_dma() hooks? In those you could detect the need for a bounce
>>> buffer, allocate it, and bluntly copy entire scattergather buffer to the 
>>> bounce buffer. It should be fairly small anyway.
>>>
>> I will look into it, and get back to you. Thanks for the suggestion.
>>  
> I looked into it and I have a question related to approach you have suggested.
> I can detect need for the bounce buffer in map_urb_for_dma() function and can allocate bounce
> buffer and bluntly copy it, but when the SG list is having data over few MB's, I think 
> my bounce buffer allocation might fail, over a period. Do you think it is possible?

Doesn't sound very likely, the sg would need to have more than 16 entries of which the
total length of consecutive 16 entries is less than 1024 bytes, and then the following
entries should be large enough to fail memory allocation. 
(for HS the total legth < 512 for 8 entries, and rest huge)  

And no real world device has yet even triggered the first condition of having 16 (8)
sg entries with total length less than max packet size.

> 
> So to avoid that, I thought of having a list of such bounce buffers held with the URB and 
> then I can break the SG list with some fixed data length (e.g 16KB or 32 KB's) bounce buffers
> and copy the SG into that bounce buffer list.
> 
> Another option is to create a bounce sg, based on detection of bounce buffer requirement, 
> where the small size SG's are combined to create a new SG list which can satisfy the 
> TRB cache requirement for the SNPS controller.
> 
> What do you suggest? Which is the best way to go about solving the problem?

As this is extremely rare (never reported in a real use case) 
I'd keep this as simple as possible. Allocate one bounce buffer when needed, and copy all.
If memory allocation fails (unlikely) print a warning, then we immediately know what to fix.

We are talking about sizes where 16 sg entries have in total 1024 bytes of data.

ehci-tegra.c does something related to what you want. It replaces
urb->transfer_buffer with a dma aligned bounce buffer. 

-Mathias



[Index of Archives]     [Linux Media]     [Linux Input]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Old Linux USB Devel Archive]

  Powered by Linux