Xen memory management primitives for GPU virtualization

Demi Marie Obenour <demi@xxxxxxxxxxxxxxxxxxxxxx> · Sun, 2 Feb 2025 00:08:46 -0500

Cc: 
Bcc: 
Subject: Xen requirements for GPU virtualization via virtio-GPU
Reply-To: 
X-Mutt-Fcc: =INBOX,=xen-devel,=Sent
X-Mutt-PGP: S

Recently, AMD submitted patches to the dri-devel mailing list to support
using application-provided buffers in virtio-GPU.  This feature is
called Shared Virtual Memory (SVM) and it is implemented via an API
called User Pointer (userptr).  This lead to some discussion on
dri-devel@xxxxxxxxxxxxxxxxxxxxx and dri-devel IRC, from which I
concluded that Xen is missing critical primitives for GPU-accelerated
graphics and compute.  The missing primitives for graphics are the ones
discussed at Xen Project Summit 2024, but it turns out that additional
primitives are needed for compute workloads.

As discussed at Xen Project Summit 2024, GPU acceleration via virtio-GPU
requires that an IOREQ server have access to the following primitives:

1. Map: Map a backend-provided buffer into the frontend.  The buffer
   might point to system memory or to a PCIe BAR.  The frontend is _not_
   allowed to use these buffers in hypercalls or grant them to other
   domains.  Accessing the pages using hypercalls directed at the
   frontend fails as if the frontend did not have the pages.  The only
   exception is that the frontend _may_ be allowed to use the buffer in
   a Map operation, provided that Revoke (below) is transitive.

2. Revoke: Revoke access to a buffer provided by the backend.  Once
   access is revoked, no operation on or in the frontend domain can
   access or modify the pages, and the backend can safely reuse the
   backing memory for other purposes.  Furthermore, revocation is not
   allowed to fail unless the backend or hypervisor is buggy, and if it
   does fail for any reason, the backend will panic.  Once access is
   revoked, further accesses by the frontend will cause a fault that the
   backend can intercept.

Map can be handled by userspace, but Revoke must be handled entirely
in-kernel.  This is because Revoke happens from a Linux MMU notifier
callback, and those are not allowed to block, fail, or involve userspace
in any way.  Since MMU notifier callbacks are called before freeing
memory, failure means that some other part of the system still has
access to freed memory that might be reused for other purposes, which
is a security vulnerability.

It turns out that compute has additional requirements.  Graphics APIs
use DMA buffers (dmabufs), which only support a subset of operations.
In particular, direct I/O doesn't work.  Compute APIs allow users to
make malloc'd memory accessible to the GPU.  This memory can be used
in Linux kernel direct I/O and in other operations that do not work
with dmabufs.  However, such memory starts out as frontend-owned pages,
so it must be converted to backend pages before it can be used by the
GPU.  Linux supports migration of userspace pages, but this is too
unreliable to be used for this purpose.  Instead, it will need to be
done by Xen and the backend.

This requires two additional primitives:

3. Steal: Convert frontend-owned pages to backend-owned pages and
   provide the backend with a mapping of the page.  After a successful
   Steal operation, the pages are in the same state as if they had been
   provided via Map.  Steal fails if the pages are currently being used
   in a hypercall, are MMIO (as opposed to system memory), were provided
   by another domain via Map or grant tables, are currently foreign
   mapped, are currently granted to another domain, or more generally
   are accessible to any domain other than the target domain.  The
   frontend's quota is decreased by the number of pages stolen, and the
   backend's quota is increased by the same amount.  A successful Steal
   operation means that Revoke and Map can be used to operate on the
   pages.

4. Return: Convert a backend-owned page to a frontend-owned page.  After
   a successful call to Return, the backend is no lonter able to use
   Revoke or Map.  The returned page ceases to count against backend
   quota and now counts against frontend quota.

Are these operations ones that Xen is interested in providing?  There
may be other primitives that are sufficient to implement the above four,
but I believe that any solution that allows virtio-GPU to work must
allow the above four operations to be implemented.  Without the first
two, virtio-GPU will not be able to support Vulkan or native contexts,
and without the second two also being present, shared virtual memory
and compute APIs that require it will not work.
-- 
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab
Attachment:
signature.asc

Description: PGP signature