On 15.11.24 17:47, Rob Nertney wrote:
On Tue, Oct 08, 2024 at 04:59:45PM +0800, Chenyi Qiang wrote:
Hi Paolo,
Kindly ping for this thread. The in-place page conversion is discussed
at Linux Plumbers. Does it give some direction for shared device
assignment enabling work?
Hi everybody.
Hi,
Our NVIDIA GPUs currently support this shared-memory/bounce-buffer method to
provide AI acceleration within TEE CVMs. We require passing though the GPU via
VFIO stubbing, which means that we are impacted by the absence of an API to
inform VFIO about page conversions.
The CSPs have enough kernel engineers who handle this process in their own host
kernels, but we have several enterprise customers who are eager to begin using
this solution in the upstream. AMD has successfully ported enough of the
SEV-SNP support into 6.11 and our initial testing shows successful operation,
but only by disabling discard via these two QEMU patches:
- https://github.com/AMDESE/qemu/commit/0c9ae28d3e199de9a40876a492e0f03a11c6f5d8
- https://github.com/AMDESE/qemu/commit/5256c41fb3055961ea7ac368acc0b86a6632d095
This "workaround" is a bit of a hack, as it effectively requires greater than
double the amount of host memory than as to be allocated to the guest CVM. The
proposal here appears to be a promising workaround; are there other solutions
that are recommended for this use case?
What people we are working on is supporting private and shared memory in
guest_memfd, and allowing an in-place conversion between shared and
private: this avoids discards + reallocation and consequently any double
memory allocation.
To get stuff into VFIO, we must only map the currently shared pages
(VFIO will pin + map them), and unmap them (VFIO will unmap + unpin
them) before converting them to private.
This series should likely achieve the
unmap-before-conversion-to-private, and map-after-conversion-to-shared,
such that it could be compatible with guest_memfd.
QEMU would simply mmap the guest_memfd to obtain a user space mapping,
from which it can pass address ranges to VFIO like we already do. This
user space mapping only allows for shared pages to be faulted in.
Currently private pages cannot be faulted in (inaccessible -> SIGBUS).
So far the theory.
I'll note that this is likely not the most elegant solution, but
something that would achieve in a reasonable timeframe one solution to
the problem.
Cheers!
--
Cheers,
David / dhildenb