Cc: Bcc: Subject: Xen requirements for GPU virtualization via virtio-GPU Reply-To: X-Mutt-Fcc: =INBOX,=xen-devel,=Sent X-Mutt-PGP: S Recently, AMD submitted patches to the dri-devel mailing list to support using application-provided buffers in virtio-GPU. This feature is called Shared Virtual Memory (SVM) and it is implemented via an API called User Pointer (userptr). This lead to some discussion on dri-devel@xxxxxxxxxxxxxxxxxxxxx and dri-devel IRC, from which I concluded that Xen is missing critical primitives for GPU-accelerated graphics and compute. The missing primitives for graphics are the ones discussed at Xen Project Summit 2024, but it turns out that additional primitives are needed for compute workloads. As discussed at Xen Project Summit 2024, GPU acceleration via virtio-GPU requires that an IOREQ server have access to the following primitives: 1. Map: Map a backend-provided buffer into the frontend. The buffer might point to system memory or to a PCIe BAR. The frontend is _not_ allowed to use these buffers in hypercalls or grant them to other domains. Accessing the pages using hypercalls directed at the frontend fails as if the frontend did not have the pages. The only exception is that the frontend _may_ be allowed to use the buffer in a Map operation, provided that Revoke (below) is transitive. 2. Revoke: Revoke access to a buffer provided by the backend. Once access is revoked, no operation on or in the frontend domain can access or modify the pages, and the backend can safely reuse the backing memory for other purposes. Furthermore, revocation is not allowed to fail unless the backend or hypervisor is buggy, and if it does fail for any reason, the backend will panic. Once access is revoked, further accesses by the frontend will cause a fault that the backend can intercept. Map can be handled by userspace, but Revoke must be handled entirely in-kernel. This is because Revoke happens from a Linux MMU notifier callback, and those are not allowed to block, fail, or involve userspace in any way. Since MMU notifier callbacks are called before freeing memory, failure means that some other part of the system still has access to freed memory that might be reused for other purposes, which is a security vulnerability. It turns out that compute has additional requirements. Graphics APIs use DMA buffers (dmabufs), which only support a subset of operations. In particular, direct I/O doesn't work. Compute APIs allow users to make malloc'd memory accessible to the GPU. This memory can be used in Linux kernel direct I/O and in other operations that do not work with dmabufs. However, such memory starts out as frontend-owned pages, so it must be converted to backend pages before it can be used by the GPU. Linux supports migration of userspace pages, but this is too unreliable to be used for this purpose. Instead, it will need to be done by Xen and the backend. This requires two additional primitives: 3. Steal: Convert frontend-owned pages to backend-owned pages and provide the backend with a mapping of the page. After a successful Steal operation, the pages are in the same state as if they had been provided via Map. Steal fails if the pages are currently being used in a hypercall, are MMIO (as opposed to system memory), were provided by another domain via Map or grant tables, are currently foreign mapped, are currently granted to another domain, or more generally are accessible to any domain other than the target domain. The frontend's quota is decreased by the number of pages stolen, and the backend's quota is increased by the same amount. A successful Steal operation means that Revoke and Map can be used to operate on the pages. 4. Return: Convert a backend-owned page to a frontend-owned page. After a successful call to Return, the backend is no lonter able to use Revoke or Map. The returned page ceases to count against backend quota and now counts against frontend quota. Are these operations ones that Xen is interested in providing? There may be other primitives that are sufficient to implement the above four, but I believe that any solution that allows virtio-GPU to work must allow the above four operations to be implemented. Without the first two, virtio-GPU will not be able to support Vulkan or native contexts, and without the second two also being present, shared virtual memory and compute APIs that require it will not work. -- Sincerely, Demi Marie Obenour (she/her/hers) Invisible Things Lab
Attachment:
signature.asc
Description: PGP signature