Hi, On 11/30/20 8:30 PM, Tom Yan wrote: > Hmm, I wonder if I/we wrongly assumed that the dma_dev used for the > hw_max_sectors clamping in __scsi_init_queue() is wrong. > > So instead of adding a fallback else-clause here or using "sysdev" as > dma_dev like in the current upstream code, maybe we should actually do > a three-way min: the "changed" hw_max_sectors, dma_max_mapping_size of > dma_dev("dev") and dma_max_mapping_size of sysdev...? So both here and in the 2/2 patch thread there are lots of open questions, which to me suggests that for 5.10 we really should just go with the 3 reverts which I suggested earlier. Regards, Hans > > On Mon, 30 Nov 2020 at 17:48, Hans de Goede <hdegoede@xxxxxxxxxx> wrote: >> >> Hi, >> >> On 11/28/20 4:48 PM, Tom Yan wrote: >>> Apparently the former (with the chosen dma_dev) may cause problem in certain >>> case (e.g. where thunderbolt dock and intel iommu are involved). The error >>> observed was: >>> >>> XHCI swiotlb buffer is full / DMAR: Device bounce map failed >>> >>> For now we retain the clamp for hw_max_sectors against the dma_max_mapping_size. >>> Since the device/size for the clamp that is applied when the scsi request queue >>> is initialized/allocated is different than the one used here, we invalidate the >>> early clamping by making a fallback blk_queue_max_hw_sectors() call. >>> >>> Signed-off-by: Tom Yan <tom.ty89@xxxxxxxxx> >> >> I can confirm that this fixes the network performance on a Lenovo Thunderbolt >> dock generation 2, which uses an USB attach NIC. >> >> With this patch added on top of 5.10-rc5 scp performance to another machine >> on the local gbit LAN goes back from the regressed 1 MB/s to its original 100MB/s >> as it should be: >> >> Tested-by: Hans de Goede <hdegoede@xxxxxxxxxx> >> >> Regards, >> >> Hans >> >> >>> --- >>> drivers/usb/storage/uas.c | 11 +++++++---- >>> 1 file changed, 7 insertions(+), 4 deletions(-) >>> >>> diff --git a/drivers/usb/storage/uas.c b/drivers/usb/storage/uas.c >>> index c8a577309e8f..5db1325cea20 100644 >>> --- a/drivers/usb/storage/uas.c >>> +++ b/drivers/usb/storage/uas.c >>> @@ -843,18 +843,21 @@ static int uas_slave_alloc(struct scsi_device *sdev) >>> static int uas_slave_configure(struct scsi_device *sdev) >>> { >>> struct uas_dev_info *devinfo = sdev->hostdata; >>> - struct device *dev = sdev->host->dma_dev; >>> + struct usb_device *udev = devinfo->udev; >>> >>> if (devinfo->flags & US_FL_MAX_SECTORS_64) >>> blk_queue_max_hw_sectors(sdev->request_queue, 64); >>> else if (devinfo->flags & US_FL_MAX_SECTORS_240) >>> blk_queue_max_hw_sectors(sdev->request_queue, 240); >>> - else if (devinfo->udev->speed >= USB_SPEED_SUPER) >>> + else if (udev->speed >= USB_SPEED_SUPER) >>> blk_queue_max_hw_sectors(sdev->request_queue, 2048); >>> + else >>> + blk_queue_max_hw_sectors(sdev->request_queue, >>> + SCSI_DEFAULT_MAX_SECTORS); >>> >>> blk_queue_max_hw_sectors(sdev->request_queue, >>> min_t(size_t, queue_max_hw_sectors(sdev->request_queue), >>> - dma_max_mapping_size(dev) >> SECTOR_SHIFT)); >>> + dma_max_mapping_size(udev->bus->sysdev) >> SECTOR_SHIFT)); >>> >>> if (devinfo->flags & US_FL_NO_REPORT_OPCODES) >>> sdev->no_report_opcodes = 1; >>> @@ -1040,7 +1043,7 @@ static int uas_probe(struct usb_interface *intf, const struct usb_device_id *id) >>> shost->can_queue = devinfo->qdepth - 2; >>> >>> usb_set_intfdata(intf, shost); >>> - result = scsi_add_host_with_dma(shost, &intf->dev, udev->bus->sysdev); >>> + result = scsi_add_host(shost, &intf->dev); >>> if (result) >>> goto free_streams; >>> >>> >> >