On Tue, Jun 28, 2022 at 03:01:34PM +0800, Chao Gao wrote: >From: Andi Kleen <ak@xxxxxxxxxxxxxxx> > >Traditionally swiotlb was not performance critical because it was only >used for slow devices. But in some setups, like TDX confidential >guests, all IO has to go through swiotlb. Currently swiotlb only has a >single lock. Under high IO load with multiple CPUs this can lead to >signifiant lock contention on the swiotlb lock. We've seen 20+% CPU >time in locks in some extreme cases. > >This patch splits the swiotlb into individual areas which have their >own lock. Each CPU tries to allocate in its own area first. Only if >that fails does it search other areas. On freeing the allocation is >freed into the area where the memory was originally allocated from. > >To avoid doing a full modulo in the main path the number of swiotlb >areas is always rounded to the next power of two. I believe that's >not really needed anymore on modern CPUs (which have fast enough >dividers), but still a good idea on older parts. > >The number of areas can be set using the swiotlb option. But to avoid >every user having to set this option set the default to the number of >available CPUs. Unfortunately on x86 swiotlb is initialized before >num_possible_cpus() is available, that is why it uses a custom hook >called from the early ACPI code. > >Signed-off-by: Andi Kleen <ak@xxxxxxxxxxxxxxx> >[ rebase and fix warnings of checkpatch.pl ] >Signed-off-by: Chao Gao <chao.gao@xxxxxxxxx> Just noticed that Tianyu already posted a variant of this patch. Will drop this one from my series.