On Tue, Apr 02, 2024 at 04:22:31PM -0500, Andrew Halaney wrote: > Hey, > > Sorry for the wide email, but I figured someone recently contributing > to / maintaining the Qualcomm SMMU driver may have some proper insights > into this. > > Recently I remembered that performance on some Qualcomm platforms > takes a major hit when you use iommu.strict=1/CONFIG_IOMMU_DEFAULT_DMA_STRICT. > > On the sa8775p-ride, I see most TLB sync calls to be about 150 us long, > with some spiking to 500 us, etc: > > [root@qti-snapdragon-ride4-sa8775p-09 ~]# trace-cmd start -p function_graph -g qcom_smmu_tlb_sync --max-graph-depth 1 > plugin 'function_graph' > [root@qti-snapdragon-ride4-sa8775p-09 ~]# trace-cmd show > # tracer: function_graph > # > # CPU DURATION FUNCTION CALLS > # | | | | | | | > 0) ! 144.062 us | qcom_smmu_tlb_sync(); > > On my sc8280xp-lenovo-thinkpad-x13s (only other Qualcomm platform I can compare > with) I see around 2-15 us with spikes up to 20-30 us. That's thanks to this > patch[0], which I guess improved the platform from 1-2 ms to the ~10 us number. > > It's not entirely clear to me how a DPU specific programming affects system > wide SMMU performance, but I'm curious if this is the only way to achieve this? > sa8775p doesn't have the DPU described even right now, so that's a bummer > as there's no way to make a similar immediate optimization, but I'm still struggling > to understand what that patch really did to improve things so maybe I'm missing > something. > The cause was that the TLB sync is synchronized with the display updates, but without appropriate safe_lut_tlb values the display side wouldn't play nice. Regards, Bjorn > I'm honestly not even sure what a "typical" range for TLB sync time would be, > but on sa8775p-ride its bad enough that some IRQs like UFS can cause RCU stalls > (pretty easy to reproduce with fio basic-verify.fio for example on the platform). > It also makes running with iommu.strict=1 impractical as performance for UFS, > ethernet, etc drops 75-80%. > > Does anyone have any bright ideas on how to improve this, or if I'm even in > the right for assuming that time is suspiciously long? > > Thanks, > Andrew > > [0] https://lore.kernel.org/linux-arm-msm/CAF6AEGs9PLiCZdJ-g42-bE6f9yMR6cMyKRdWOY5m799vF9o4SQ@xxxxxxxxxxxxxx/ >