On Tue, Oct 31, 2023 at 5:35 AM Johan Hovold <johan@xxxxxxxxxx> wrote: > > On Mon, Oct 30, 2023 at 04:23:20PM -0700, Bjorn Andersson wrote: > > During USB transfers on the SC8280XP __arm_smmu_tlb_sync() is seen to > > typically take 1-2ms to complete. As expected this results in poor > > performance, something that has been mitigated by proposing running the > > iommu in non-strict mode (boot with iommu.strict=0). > > > > This turns out to be related to the SAFE logic, and programming the QOS > > SAFE values in the DPU (per suggestion from Rob and Doug) reduces the > > TLB sync time to below 10us, which means significant less time spent > > with interrupts disabled and a significant boost in throughput. > > I ran some tests with a gigabit ethernet adapter to get an idea of how > this performs in comparison to using lazy iommu mode ("non-strict"): > > 6.6 6.6-lazy 6.6-dpu 6.6-dpu-lazy > iperf3 recv 114 941 941 941 MBit/s > iperf3 send 124 891 703 940 MBit/s > > scp recv 14.6 110 110 111 MB/s > scp send 12.5 98.9 91.5 110 MB/s > > This patch in itself indeed improves things quite a bit, but there is > still some performance that can be gained by using lazy iommu mode. > > Notably, lazy mode with this patch applied appears to saturate the link > in both directions. Maybe there is still room for SoC specific udev rules so dma masters without firmware can be configured as "lazy", ie. like: https://chromium.googlesource.com/chromiumos/overlays/board-overlays/+/refs/heads/main/baseboard-trogdor/chromeos-base/chromeos-bsp-baseboard-trogdor/files/98-qcom-nonstrict-iommu.rules BR, -R > Tested-by: Johan Hovold <johan+linaro@xxxxxxxxxx> > > Johan