Hi, On 2/12/2020 8:34 PM, Mathias Nyman wrote: > On 7.2.2020 19.17, Tejas Joglekar wrote: >> Hi, >> Thanks for the review comments. >> On 1/3/2020 10:14 PM, Mathias Nyman wrote: >>> On 20.12.2019 15.39, Tejas Joglekar wrote: >>>> The Synopsys xHC has an internal TRB cache of size TRB_CACHE_SIZE for >>>> each endpoint. The default value for TRB_CACHE_SIZE is 16 for SS and 8 >>>> for HS. The controller loads and updates the TRB cache from the transfer >>>> ring in system memory whenever the driver issues a start transfer or >>>> update transfer command. >>>> >>>> For chained TRBs, the Synopsys xHC requires that the total amount of >>>> bytes for all TRBs loaded in the TRB cache be greater than or equal to 1 >>>> MPS. Or the chain ends within the TRB cache (with a last TRB). >>>> >>>> If this requirement is not met, the controller will not be able to send >>>> or receive a packet and it will hang causing a driver timeout and error. >>>> >>>> This can be a problem if a class driver queues SG requests with many >>>> small-buffer entries. The XHCI driver will create a chained TRB for each >>>> entry which may trigger this issue. >>>> >>>> This patch adds logic to the XHCI driver to detect and prevent this from >>>> happening. >>>> >>>> For every (TRB_CACHE_SIZE - 2) TRBs, we check the total buffer size of >>>> the TRBs and if the chain continues and we don't make up at least 1 MPS, >>>> we create a bounce buffer to consolidate up to the next (4 * MPS) TRBs >>>> into the last TRB. >>>> >>>> We check at (TRB_CACHE_SIZE - 2) because it is possible that there would >>>> be a link and/or event data TRB that take up to 2 of the cache entries. >>>> And we consolidate the next (4 * MPS) to improve performance. >>>> >>>> We discovered this issue with devices on other platforms but have not >>>> yet come across any device that triggers this on Linux. But it could be >>>> a real problem now or in the future. All it takes is N number of small >>>> chained TRBs. And other instances of the Synopsys IP may have smaller >>>> values for the TRB_CACHE_SIZE which would exacerbate the problem. >>>> >>>> We verified this patch using our internal driver and testing framework. >>> >>> >> I understand that in a first look it looks a complex solution, but you can suggest >> the modifications/changes which would be required to make the patch more readable. >> I have tried to keep the implementation similar to bounce buffer implementation >> only with addition of bounce buffer list. For the spinlock case, you can take a >> call if it is required or not. > > In your case you know the need for a bounce buffer much earlier than in the > existing TD fragment case. > > Have you looked into the struct hc_driver map_urb_for_dma() and > unmap_urb_for_dma() hooks? In those you could detect the need for a bounce > buffer, allocate it, and bluntly copy entire scattergather buffer to the > bounce buffer. It should be fairly small anyway. > I will look into it, and get back to you. Thanks for the suggestion. > -Mathias > Regards, Tejas Joglekar