On 4/28/2022 10:44 PM, Robin Murphy wrote:
On 2022-04-28 15:14, Tianyu Lan wrote:
From: Tianyu Lan <Tianyu.Lan@xxxxxxxxxxxxx>
Traditionally swiotlb was not performance critical because it was only
used for slow devices. But in some setups, like TDX/SEV confidential
guests, all IO has to go through swiotlb. Currently swiotlb only has a
single lock. Under high IO load with multiple CPUs this can lead to
significat lock contention on the swiotlb lock.
This patch splits the swiotlb into individual areas which have their
own lock. When there are swiotlb map/allocate request, allocate
io tlb buffer from areas averagely and free the allocation back
to the associated area. This is to prepare to resolve the overhead
of single spinlock among device's queues. Per device may have its
own io tlb mem and bounce buffer pool.
This idea from Andi Kleen
patch(https://github.com/intel/tdx/commit/4529b578
4c141782c72ec9bd9a92df2b68cb7d45). Rework it and make it may work
for individual device's io tlb mem. The device driver may determine
area number according to device queue number.
Rather than introduce this extra level of allocator complexity, how
about just dividing up the initial SWIOTLB allocation into multiple
io_tlb_mem instances?
Robin.
Agree. Thanks for suggestion. That will be more generic and will update
in the next version.
Thanks.