From: Tianyu Lan <Tianyu.Lan@xxxxxxxxxxxxx> Traditionally swiotlb was not performance critical because it was only used for slow devices. But in some setups, like TDX/SEV confidential guests, all IO has to go through swiotlb. Currently swiotlb only has a single lock. Under high IO load with multiple CPUs this can lead to significant lock contention on the swiotlb lock. This patchset splits the swiotlb into individual areas which have their own lock. When there are swiotlb map/allocate request, allocate io tlb buffer from areas averagely and free the allocation back to the associated area. Patch 2 introduces an helper function to allocate bounce buffer from default IO tlb pool for devices with new IO TLB block unit and set up IO TLB area for device queues to avoid spinlock overhead. The area number is set by device driver according queue number. The network test between traditional VM and Confidential VM. The throughput improves from ~20Gb/s to ~34Gb/s with this patchset. Tianyu Lan (2): swiotlb: Split up single swiotlb lock Swiotlb: Add device bounce buffer allocation interface include/linux/swiotlb.h | 58 +++++++ kernel/dma/swiotlb.c | 340 +++++++++++++++++++++++++++++++++++----- 2 files changed, 362 insertions(+), 36 deletions(-) -- 2.25.1