Re: [RFC PATCH V2 1/2] swiotlb: Add Child IO TLB mem support

Tianyu Lan <ltykernel@xxxxxxxxx> · Mon, 16 May 2022 21:08:58 +0800

On 5/16/2022 3:34 PM, Christoph Hellwig wrote:
I don't really understand how 'childs' fit in here.  The code also
doesn't seem to be usable without patch 2 and a caller of the
new functions added in patch 2, so it is rather impossible to review.

Hi Christoph:
     OK. I will merge two patches and add a caller patch. The motivation
is to avoid global spin lock when devices use swiotlb bounce buffer and
this introduces overhead during high throughput cases. In my test
environment, current code can achieve about 24Gb/s network throughput
with SWIOTLB force enabled and it can achieve about 40Gb/s without
SWIOTLB force. Storage also has the same issue.
     Per-device IO TLB mem may resolve global spin lock issue among
devices but device still may have multi queues. Multi queues still need
to share one spin lock. This is why introduce child or IO tlb areas in
the previous patches. Each device queues will have separate child IO TLB
mem and single spin lock to manage their IO TLB buffers.
     Otherwise, global spin lock still cost cpu usage during high 
throughput even when there is performance regression. Each device queues 
needs to spin on the different cpus to acquire the global lock. Child IO
TLB mem also may resolve the cpu issue.

Also:

  1) why is SEV/TDX so different from other cases that need bounce
     buffering to treat it different and we can't work on a general
     scalability improvement

	Other cases also have global spin lock issue but it depends on
        whether hits the bottleneck. The cpu usage issue may be ignored.

  2) per previous discussions at how swiotlb itself works, it is
     clear that another option is to just make pages we DMA to
     shared with the hypervisor.  Why don't we try that at least
     for larger I/O?

	For confidential VM(Both TDX and SEV), we need to use bounce
	buffer to copy between private memory that hypervisor can't
	access directly and shared memory. For security consideration,
	confidential VM	should not share IO stack DMA pages with
       	hypervisor directly to avoid attack from hypervisor when IO
	stack handles the DMA data.