Hi, On 5/18/2020 9:51 PM, Mathias Nyman wrote: > On 21.4.2020 12.49, Tejas Joglekar wrote: >> The Synopsys xHC has an internal TRB cache of size TRB_CACHE_SIZE for >> each endpoint. The default value for TRB_CACHE_SIZE is 16 for SS and 8 >> for HS. The controller loads and updates the TRB cache from the transfer >> ring in system memory whenever the driver issues a start transfer or >> update transfer command. >> >> For chained TRBs, the Synopsys xHC requires that the total amount of >> bytes for all TRBs loaded in the TRB cache be greater than or equal to 1 >> MPS. Or the chain ends within the TRB cache (with a last TRB). >> >> If this requirement is not met, the controller will not be able to send >> or receive a packet and it will hang causing a driver timeout and error. >> >> This can be a problem if a class driver queues SG requests with many >> small-buffer entries. The XHCI driver will create a chained TRB for each >> entry which may trigger this issue. >> >> This patch adds logic to the XHCI driver to detect and prevent this from >> happening. >> >> For every (TRB_CACHE_SIZE - 2), we check the total buffer size of >> the SG list and if the last window of (TRB_CACHE_SIZE - 2) SG list length >> and we don't make up at least 1 MPS, we create a temporary buffer to >> consolidate full SG list into the buffer. >> >> We check at (TRB_CACHE_SIZE - 2) window because it is possible that there >> would be a link and/or event data TRB that take up to 2 of the cache >> entries. >> >> We discovered this issue with devices on other platforms but have not >> yet come across any device that triggers this on Linux. But it could be >> a real problem now or in the future. All it takes is N number of small >> chained TRBs. And other instances of the Synopsys IP may have smaller >> values for the TRB_CACHE_SIZE which would exacerbate the problem. >> >> Signed-off-by: Tejas Joglekar <joglekar@xxxxxxxxxxxx> >> --- >> Changes in v2: >> - Removed redundant debug messages >> - Modified logic to remove unnecessary changes in hcd.c >> - Rename the quirk >> >> drivers/usb/host/xhci-ring.c | 2 +- >> drivers/usb/host/xhci.c | 125 +++++++++++++++++++++++++++++++++++++++++++ >> drivers/usb/host/xhci.h | 4 ++ >> 3 files changed, 130 insertions(+), 1 deletion(-) >> >> diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c >> index a78787bb5133..2fad9474912a 100644 >> --- a/drivers/usb/host/xhci-ring.c >> +++ b/drivers/usb/host/xhci-ring.c >> @@ -3291,7 +3291,7 @@ int xhci_queue_bulk_tx(struct xhci_hcd *xhci, gfp_t mem_flags, >> >> full_len = urb->transfer_buffer_length; >> /* If we have scatter/gather list, we use it. */ >> - if (urb->num_sgs) { >> + if (urb->num_sgs && !(urb->transfer_flags & URB_DMA_MAP_SINGLE)) { >> num_sgs = urb->num_mapped_sgs; >> sg = urb->sg; >> addr = (u64) sg_dma_address(sg); >> diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c >> index fe38275363e0..15f06bc6b1ad 100644 >> --- a/drivers/usb/host/xhci.c >> +++ b/drivers/usb/host/xhci.c >> @@ -1256,6 +1256,106 @@ EXPORT_SYMBOL_GPL(xhci_resume); >> >> /*-------------------------------------------------------------------------*/ >> >> +static int xhci_map_temp_buffer(struct usb_hcd *hcd, struct urb *urb) >> +{ >> + void *temp; >> + int ret = 0; >> + unsigned int len; >> + unsigned int buf_len; >> + enum dma_data_direction dir; >> + struct xhci_hcd *xhci; >> + >> + xhci = hcd_to_xhci(hcd); >> + dir = usb_urb_dir_in(urb) ? DMA_FROM_DEVICE : DMA_TO_DEVICE; >> + buf_len = urb->transfer_buffer_length; >> + >> + temp = kzalloc_node(buf_len, GFP_ATOMIC, >> + dev_to_node(hcd->self.sysdev)); >> + >> + if (usb_urb_dir_out(urb)) >> + len = sg_pcopy_to_buffer(urb->sg, urb->num_sgs, >> + temp, buf_len, 0); >> + >> + urb->transfer_buffer = temp; >> + urb->transfer_dma = dma_map_single(hcd->self.sysdev, >> + urb->transfer_buffer, >> + urb->transfer_buffer_length, >> + dir); >> + >> + if (dma_mapping_error(hcd->self.sysdev, >> + urb->transfer_dma)) { >> + ret = -EAGAIN; >> + kfree(temp); >> + } else { >> + urb->transfer_flags |= URB_DMA_MAP_SINGLE; > > Not sure if using URB_DMA_MAP_SINGLE to flag that this urb is boucebuffered is > appropriate. > > If Greg or Alan don't object then it's fine for me as well. > > > @Greg/Alan do you suggest any change for the flag here? >> + } >> + >> + return ret; >> +} >> + >> +static bool xhci_urb_temp_buffer_required(struct usb_hcd *hcd, >> + struct urb *urb) >> +{ >> + bool ret = false; >> + unsigned int i; >> + unsigned int len = 0; >> + unsigned int buf_len; >> + unsigned int trb_size; >> + unsigned int max_pkt; >> + struct scatterlist *sg; >> + struct scatterlist *tail_sg; >> + >> + sg = urb->sg; >> + tail_sg = urb->sg; >> + buf_len = urb->transfer_buffer_length; >> + max_pkt = usb_endpoint_maxp(&urb->ep->desc); >> + >> + if (!urb->num_sgs) >> + return ret; >> + >> + if (urb->dev->speed >= USB_SPEED_SUPER) >> + trb_size = TRB_CACHE_SIZE_SS; >> + else >> + trb_size = TRB_CACHE_SIZE_HS; >> + >> + if (urb->transfer_buffer_length != 0 && >> + !(urb->transfer_flags & URB_NO_TRANSFER_DMA_MAP)) { >> + for_each_sg(urb->sg, sg, urb->num_sgs, i) { >> + len = len + sg->length; >> + if (i > trb_size - 2) { >> + len = len - tail_sg->length; >> + if (len < max_pkt) { >> + ret = true; >> + break; >> + } >> + >> + tail_sg = sg_next(tail_sg); >> + } >> + } >> + } >> + return ret; >> +} >> + >> +static void xhci_unmap_temp_buf(struct urb *urb) >> +{ >> + struct scatterlist *sg; >> + unsigned int len; >> + unsigned int buf_len; >> + >> + sg = urb->sg; >> + buf_len = urb->transfer_buffer_length; >> + >> + if (usb_urb_dir_in(urb)) { >> + len = sg_pcopy_from_buffer(urb->sg, urb->num_sgs, >> + urb->transfer_buffer, >> + buf_len, >> + 0); >> + } >> + >> + kfree(urb->transfer_buffer); >> + urb->transfer_buffer = NULL; > > clear URB_DMA_MAP_SINGLE from urb->transfer_flags? > In current implmentation the dma_unmap is done with existing function usb_hcd_unmap_urb_for_dma() and then buffer is copied in the xhci_unmap_temp_buf(), so URB_DMA_MAP_SINGLE flag gets cleared with usb_hcd_unmap_urb_for_dma(). I think in next patch set I will add logic for dma_unmap in xhci_unmap_temp_buf() function. > Everything else looks good do me. > -Mathias > Thanks & Regards, Tejas Joglekar