Am 13.03.25 um 13:50 schrieb Thomas Hellström: > Hi, Christian > > On Thu, 2025-03-13 at 11:19 +0100, Christian König wrote: >> Am 12.03.25 um 22:03 schrieb Thomas Hellström: >>> This RFC implements and requests comments for a way to handle SVM >>> with multi-device, >>> typically with fast interconnects. It adds generic code and helpers >>> in drm, and >>> device-specific code for xe. >>> >>> For SVM, devices set up maps of device-private struct pages, using >>> a struct dev_pagemap, >>> The CPU virtual address space (mm), can then be set up using >>> special page-table entries >>> to point to such pages, but they can't be accessed directly by the >>> CPU, but possibly >>> by other devices using a fast interconnect. This series aims to >>> provide helpers to >>> identify pagemaps that take part in such a fast interconnect and to >>> aid in migrating >>> between them. >>> >>> This is initially done by augmenting the struct dev_pagemap with a >>> struct drm_pagemap, >>> and having the struct drm_pagemap implement a "populate_mm" method, >>> where a region of >>> the CPU virtual address space (mm) is populated with device_private >>> pages from the >>> dev_pagemap associated with the drm_pagemap, migrating data from >>> system memory or other >>> devices if necessary. The drm_pagemap_populate_mm() function is >>> then typically called >>> from a fault handler, using the struct drm_pagemap pointer of >>> choice. It could be >>> referencing a local drm_pagemap or a remote one. The migration is >>> now completely done >>> by drm_pagemap callbacks, (typically using a copy-engine local to >>> the dev_pagemap local >>> memory). >> Up till here that makes sense. Maybe not necessary to be put into the >> DRM layer, but that is an implementation detail. >> >>> In addition there are helpers to build a drm_pagemap UAPI using >>> file-descripors >>> representing struct drm_pagemaps, and a helper to register devices >>> with a common >>> fast interconnect. The UAPI is intended to be private to the >>> device, but if drivers >>> agree to identify struct drm_pagemaps by file descriptors one could >>> in theory >>> do cross-driver multi-device SVM if a use-case were found. >> But this completely eludes me. >> >> Why would you want an UAPI for representing pagemaps as file >> descriptors? Isn't it the kernel which enumerates the interconnects >> of the devices? >> >> I mean we somehow need to expose those interconnects between devices >> to userspace, e.g. like amdgpu does with it's XGMI connectors. But >> that is static for the hardware (unless HW is hot removed/added) and >> so I would assume exposed through sysfs. > Thanks for the feedback. > > The idea here is not to expose the interconnects but rather have a way > for user-space to identify a drm_pagemap and some level of access- and > lifetime control. Well that's what I get I just don't get why? I mean when you want to have the pagemap as optional feature you can turn on and off I would say make that a sysfs file. It's a global feature anyway and not bound in any way to the file descriptor, isn't it? > For Xe, If an application wants to use a particular drm_pagemap it > calls an ioctl: > > pagemap_fd = drm_xe_ioctl_pagemap_open(exporting_device_fd, > memory_region); Well should userspace deal with physical addresses here, or what exactly is memory_region here? Regards, Christian. > > And then when it's no longer used > close(pagemap_fd) > > To use it for a memory range, the intended idea is call gpu madvise > ioctl: > > err = drm_xe_ioctl_gpu_madvise(local_device_fd, range, pagemap_fd); > > Now, if there is no fast interconnect between the two, the madvise call > could just return an error. All this ofc assumes that user-space is > somehow aware of the fast interconnect topology but how that is exposed > is beyond the scope of this first series. (Suggestions welcome). > > The advantage of the above approach is > 1) We get some level of access control. If the user doesn't have access > to the exporting device, he/she can't obtain a pagemap file descriptor. > > 2) Lifetime control. The pagemaps are memory hungry, but also take > considerable time to set up and tear down. > > 3) It's a driver-independent approach. > > One could ofc use a different approach by feeding the gpu_madvise() > ioctl with a remote device file descriptor and whatever information is > needed for the remote device to identify the drm_pagemap. That would > not be driver independent, though. Not sure how important that is. > > /Thomas > > >> Thanks, >> Christian. >> >>> The implementation for the Xe driver uses dynamic pagemaps which >>> are created on first >>> use and removed 5s after the last reference is gone. Pagemaps are >>> revoked on >>> device unbind, and data is then migrated to system. >>> >>> Status: >>> This is a POC series. It has been tested with an IGT test soon to >>> be published, with a >>> DG1 drm_pagemap and a BattleMage SVM client. There is separate work >>> ongoing for the >>> gpu_madvise functionality. >>> >>> The Xe implementation of the "populate_mm()" callback is >>> still rudimentary and doesn't migrate from foreign devices. It >>> should be tuned to do >>> smarter choices. >>> >>> Any feedback appreciated. >>> >>> Patch overview: >>> Patch 1: >>> - Extends the way the Xe driver can compile out SVM support and >>> pagemaps. >>> Patch 2: >>> - Fixes an existing potential UAF in the Xe SVM code. >>> Patch 3: >>> - Introduces the drm_pagemap.c file and moves drm_pagemap >>> functionality to it. >>> Patch 4: >>> - Adds a populate_mm op to drm_pagemap. >>> Patch 5: >>> - Implement Xe's version of the populate_mm op. >>> Patch 6: >>> - Refcount struct drm_pagemap. >>> Patch 7: >>> - Cleanup patch. >>> Patch 8: >>> - Add a bo_remove callback for Xe, Used during device unbind. >>> Patch 9: >>> - Add a drm_pagemap utility to calculate a common owner structure >>> Patch 10: >>> - Adopt GPUSVM to a (sort of) dynamic owner. >>> Patch 11: >>> - Xe calculates the dev_private owner using the drm_pagemap >>> utility. >>> Patch 12: >>> - Update the Xe page-table code to handle per range mixed system / >>> device_private placement. >>> Patch 13: >>> - Modify GPUSVM to allow such placements. >>> Patch 14: >>> - Add a preferred pagemap to use by the Xe fault handler. >>> Patch 15: >>> - Add a utility that converts between drm_pagemaps and file- >>> descriptors and back. >>> Patch 16: >>> - Fix Xe so that also devices without fault capability can publish >>> drm_pagemaps. >>> Patch 17: >>> - Add the devmem_open UAPI, creating a drm_pagemap file descriptor >>> from a >>> (device, region) pair. >>> Patch 18: >>> - (Only for POC) Add an GPU madvise prefer_devmem IOCTL. >>> Patch 19: >>> - (Only for POC) Implement pcie p2p DMA as a fast interconnect and >>> test. >>> >>> Matthew Brost (1): >>> drm/gpusvm, drm/pagemap: Move migration functionality to >>> drm_pagemap >>> >>> Thomas Hellström (18): >>> drm/xe: Introduce CONFIG_DRM_XE_GPUSVM >>> drm/xe/svm: Fix a potential bo UAF >>> drm/pagemap: Add a populate_mm op >>> drm/xe: Implement and use the drm_pagemap populate_mm op >>> drm/pagemap, drm/xe: Add refcounting to struct drm_pagemap and >>> manage >>> lifetime >>> drm/pagemap: Get rid of the struct >>> drm_pagemap_zdd::device_private_page_owner field >>> drm/xe/bo: Add a bo remove callback >>> drm/pagemap_util: Add a utility to assign an owner to a set of >>> interconnected gpus >>> drm/gpusvm, drm/xe: Move the device private owner to the >>> drm_gpusvm_ctx >>> drm/xe: Use the drm_pagemap_util helper to get a svm pagemap >>> owner >>> drm/xe: Make the PT code handle placement per PTE rather than per >>> vma >>> / range >>> drm/gpusvm: Allow mixed mappings >>> drm/xe: Add a preferred dpagemap >>> drm/pagemap/util: Add file descriptors pointing to struct >>> drm_pagemap >>> drm/xe/migrate: Allow xe_migrate_vram() also on non-pagefault >>> capable >>> devices >>> drm/xe/uapi: Add the devmem_open ioctl >>> drm/xe/uapi: HAX: Add the xe_madvise_prefer_devmem IOCTL >>> drm/xe: HAX: Use pcie p2p dma to test fast interconnect >>> >>> Documentation/gpu/rfc/gpusvm.rst | 12 +- >>> drivers/gpu/drm/Makefile | 7 +- >>> drivers/gpu/drm/drm_gpusvm.c | 782 +--------------------- >>> drivers/gpu/drm/drm_pagemap.c | 940 >>> +++++++++++++++++++++++++++ >>> drivers/gpu/drm/drm_pagemap_util.c | 203 ++++++ >>> drivers/gpu/drm/xe/Kconfig | 24 +- >>> drivers/gpu/drm/xe/Makefile | 2 +- >>> drivers/gpu/drm/xe/xe_bo.c | 65 +- >>> drivers/gpu/drm/xe/xe_bo.h | 2 + >>> drivers/gpu/drm/xe/xe_bo_types.h | 2 +- >>> drivers/gpu/drm/xe/xe_device.c | 8 + >>> drivers/gpu/drm/xe/xe_device_types.h | 30 +- >>> drivers/gpu/drm/xe/xe_migrate.c | 8 +- >>> drivers/gpu/drm/xe/xe_pt.c | 112 ++-- >>> drivers/gpu/drm/xe/xe_query.c | 2 +- >>> drivers/gpu/drm/xe/xe_svm.c | 716 +++++++++++++++++--- >>> drivers/gpu/drm/xe/xe_svm.h | 158 ++++- >>> drivers/gpu/drm/xe/xe_tile.c | 20 +- >>> drivers/gpu/drm/xe/xe_tile.h | 33 + >>> drivers/gpu/drm/xe/xe_vm.c | 6 +- >>> drivers/gpu/drm/xe/xe_vm_types.h | 7 + >>> include/drm/drm_gpusvm.h | 102 +-- >>> include/drm/drm_pagemap.h | 190 +++++- >>> include/drm/drm_pagemap_util.h | 59 ++ >>> include/uapi/drm/xe_drm.h | 39 ++ >>> 25 files changed, 2458 insertions(+), 1071 deletions(-) >>> create mode 100644 drivers/gpu/drm/drm_pagemap.c >>> create mode 100644 drivers/gpu/drm/drm_pagemap_util.c >>> create mode 100644 include/drm/drm_pagemap_util.h >>>