Re: [RFC PATCH 00/19] drm, drm/xe: Multi-device GPUSVM

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 2025-03-13 at 13:57 +0100, Christian König wrote:
> Am 13.03.25 um 13:50 schrieb Thomas Hellström:
> > Hi, Christian
> > 
> > On Thu, 2025-03-13 at 11:19 +0100, Christian König wrote:
> > > Am 12.03.25 um 22:03 schrieb Thomas Hellström:
> > > > This RFC implements and requests comments for a way to handle
> > > > SVM
> > > > with multi-device,
> > > > typically with fast interconnects. It adds generic code and
> > > > helpers
> > > > in drm, and
> > > > device-specific code for xe.
> > > > 
> > > > For SVM, devices set up maps of device-private struct pages,
> > > > using
> > > > a struct dev_pagemap,
> > > > The CPU virtual address space (mm), can then be set up using
> > > > special page-table entries
> > > > to point to such pages, but they can't be accessed directly by
> > > > the
> > > > CPU, but possibly
> > > > by other devices using a fast interconnect. This series aims to
> > > > provide helpers to
> > > > identify pagemaps that take part in such a fast interconnect
> > > > and to
> > > > aid in migrating
> > > > between them.
> > > > 
> > > > This is initially done by augmenting the struct dev_pagemap
> > > > with a
> > > > struct drm_pagemap,
> > > > and having the struct drm_pagemap implement a "populate_mm"
> > > > method,
> > > > where a region of
> > > > the CPU virtual address space (mm) is populated with
> > > > device_private
> > > > pages from the
> > > > dev_pagemap associated with the drm_pagemap, migrating data
> > > > from
> > > > system memory or other
> > > > devices if necessary. The drm_pagemap_populate_mm() function is
> > > > then typically called
> > > > from a fault handler, using the struct drm_pagemap pointer of
> > > > choice. It could be
> > > > referencing a local drm_pagemap or a remote one. The migration
> > > > is
> > > > now completely done
> > > > by drm_pagemap callbacks, (typically using a copy-engine local
> > > > to
> > > > the dev_pagemap local
> > > > memory).
> > > Up till here that makes sense. Maybe not necessary to be put into
> > > the
> > > DRM layer, but that is an implementation detail.
> > > 
> > > > In addition there are helpers to build a drm_pagemap UAPI using
> > > > file-descripors
> > > > representing struct drm_pagemaps, and a helper to register
> > > > devices
> > > > with a common
> > > > fast interconnect. The UAPI is intended to be private to the
> > > > device, but if drivers
> > > > agree to identify struct drm_pagemaps by file descriptors one
> > > > could
> > > > in theory
> > > > do cross-driver multi-device SVM if a use-case were found.
> > > But this completely eludes me.
> > > 
> > > Why would you want an UAPI for representing pagemaps as file
> > > descriptors? Isn't it the kernel which enumerates the
> > > interconnects
> > > of the devices?
> > > 
> > > I mean we somehow need to expose those interconnects between
> > > devices
> > > to userspace, e.g. like amdgpu does with it's XGMI connectors.
> > > But
> > > that is static for the hardware (unless HW is hot removed/added)
> > > and
> > > so I would assume exposed through sysfs.
> > Thanks for the feedback.
> > 
> > The idea here is not to expose the interconnects but rather have a
> > way
> > for user-space to identify a drm_pagemap and some level of access-
> > and
> > lifetime control.
> 
> Well that's what I get I just don't get why?
> 
> I mean when you want to have the pagemap as optional feature you can
> turn on and off I would say make that a sysfs file.
> 
> It's a global feature anyway and not bound in any way to the file
> descriptor, isn't it?

As it is currently coded in Xe, the drm_pagemap is revoked when the the
struct files backing the fds are released. (After a delay that is, to
avoid repeatedly destroying and re-creating the pagemap).

Will take the sysfs switch back to the team and discuss what's the
preferred option.

> 
> > For Xe, If an application wants to use a particular drm_pagemap it
> > calls an ioctl:
> > 
> > pagemap_fd = drm_xe_ioctl_pagemap_open(exporting_device_fd,
> > memory_region);
> 
> Well should userspace deal with physical addresses here, or what
> exactly is memory_region here?

On some hardware we have multiple VRAM banks per device, one per
"tile". The memory_region selects which one to use.

/Thomas


> 
> Regards,
> Christian.
> 
> > 
> > And then when it's no longer used
> > close(pagemap_fd)
> > 
> > To use it for a memory range, the intended idea is call gpu madvise
> > ioctl:
> >  
> > err = drm_xe_ioctl_gpu_madvise(local_device_fd, range, pagemap_fd);
> > 
> > Now, if there is no fast interconnect between the two, the madvise
> > call
> > could just return an error. All this ofc assumes that user-space is
> > somehow aware of the fast interconnect topology but how that is
> > exposed
> > is beyond the scope of this first series. (Suggestions welcome).
> > 
> > The advantage of the above approach is
> > 1) We get some level of access control. If the user doesn't have
> > access
> > to the exporting device, he/she can't obtain a pagemap file
> > descriptor.
> > 
> > 2) Lifetime control. The pagemaps are memory hungry, but also take
> > considerable time to set up and tear down.
> > 
> > 3) It's a driver-independent approach.
> > 
> > One could ofc use a different approach by feeding the gpu_madvise()
> > ioctl with a remote device file descriptor and whatever information
> > is
> > needed for the remote device to identify the drm_pagemap. That
> > would
> > not be driver independent, though. Not sure how important that is.
> > 
> > /Thomas
> > 
> > 
> > > Thanks,
> > > Christian.
> > > 
> > > > The implementation for the Xe driver uses dynamic pagemaps
> > > > which
> > > > are created on first
> > > > use and removed 5s after the last reference is gone. Pagemaps
> > > > are
> > > > revoked on
> > > > device unbind, and data is then migrated to system.
> > > > 
> > > > Status:
> > > > This is a POC series. It has been tested with an IGT test soon
> > > > to
> > > > be published, with a
> > > > DG1 drm_pagemap and a BattleMage SVM client. There is separate
> > > > work
> > > > ongoing for the
> > > > gpu_madvise functionality.
> > > > 
> > > > The Xe implementation of the "populate_mm()" callback is
> > > > still rudimentary and doesn't migrate from foreign devices. It
> > > > should be tuned to do
> > > > smarter choices.
> > > > 
> > > > Any feedback appreciated.
> > > > 
> > > > Patch overview:
> > > > Patch 1:
> > > > - Extends the way the Xe driver can compile out SVM support and
> > > > pagemaps.
> > > > Patch 2:
> > > > - Fixes an existing potential UAF in the Xe SVM code.
> > > > Patch 3:
> > > > - Introduces the drm_pagemap.c file and moves drm_pagemap
> > > > functionality to it.
> > > > Patch 4:
> > > > - Adds a populate_mm op to drm_pagemap.
> > > > Patch 5:
> > > > - Implement Xe's version of the populate_mm op.
> > > > Patch 6:
> > > > - Refcount struct drm_pagemap.
> > > > Patch 7:
> > > > - Cleanup patch.
> > > > Patch 8:
> > > > - Add a bo_remove callback for Xe, Used during device unbind.
> > > > Patch 9:
> > > > - Add a drm_pagemap utility to calculate a common owner
> > > > structure
> > > > Patch 10:
> > > > - Adopt GPUSVM to a (sort of) dynamic owner.
> > > > Patch 11:
> > > > - Xe calculates the dev_private owner using the drm_pagemap
> > > > utility.
> > > > Patch 12:
> > > > - Update the Xe page-table code to handle per range mixed
> > > > system /
> > > > device_private placement.
> > > > Patch 13:
> > > > - Modify GPUSVM to allow such placements.
> > > > Patch 14:
> > > > - Add a preferred pagemap to use by the Xe fault handler.
> > > > Patch 15:
> > > > - Add a utility that converts between drm_pagemaps and file-
> > > > descriptors and back.
> > > > Patch 16:
> > > > - Fix Xe so that also devices without fault capability can
> > > > publish
> > > > drm_pagemaps.
> > > > Patch 17:
> > > > - Add the devmem_open UAPI, creating a drm_pagemap file
> > > > descriptor
> > > > from a
> > > >   (device, region) pair.
> > > > Patch 18:
> > > > - (Only for POC) Add an GPU madvise prefer_devmem IOCTL.
> > > > Patch 19:
> > > > - (Only for POC) Implement pcie p2p DMA as a fast interconnect
> > > > and
> > > > test.
> > > > 
> > > > Matthew Brost (1):
> > > >   drm/gpusvm, drm/pagemap: Move migration functionality to
> > > > drm_pagemap
> > > > 
> > > > Thomas Hellström (18):
> > > >   drm/xe: Introduce CONFIG_DRM_XE_GPUSVM
> > > >   drm/xe/svm: Fix a potential bo UAF
> > > >   drm/pagemap: Add a populate_mm op
> > > >   drm/xe: Implement and use the drm_pagemap populate_mm op
> > > >   drm/pagemap, drm/xe: Add refcounting to struct drm_pagemap
> > > > and
> > > > manage
> > > >     lifetime
> > > >   drm/pagemap: Get rid of the struct
> > > >     drm_pagemap_zdd::device_private_page_owner field
> > > >   drm/xe/bo: Add a bo remove callback
> > > >   drm/pagemap_util: Add a utility to assign an owner to a set
> > > > of
> > > >     interconnected gpus
> > > >   drm/gpusvm, drm/xe: Move the device private owner to the
> > > >     drm_gpusvm_ctx
> > > >   drm/xe: Use the drm_pagemap_util helper to get a svm pagemap
> > > > owner
> > > >   drm/xe: Make the PT code handle placement per PTE rather than
> > > > per
> > > > vma
> > > >     / range
> > > >   drm/gpusvm: Allow mixed mappings
> > > >   drm/xe: Add a preferred dpagemap
> > > >   drm/pagemap/util: Add file descriptors pointing to struct
> > > > drm_pagemap
> > > >   drm/xe/migrate: Allow xe_migrate_vram() also on non-pagefault
> > > > capable
> > > >     devices
> > > >   drm/xe/uapi: Add the devmem_open ioctl
> > > >   drm/xe/uapi: HAX: Add the xe_madvise_prefer_devmem IOCTL
> > > >   drm/xe: HAX: Use pcie p2p dma to test fast interconnect
> > > > 
> > > >  Documentation/gpu/rfc/gpusvm.rst     |  12 +-
> > > >  drivers/gpu/drm/Makefile             |   7 +-
> > > >  drivers/gpu/drm/drm_gpusvm.c         | 782 +------------------
> > > > ---
> > > >  drivers/gpu/drm/drm_pagemap.c        | 940
> > > > +++++++++++++++++++++++++++
> > > >  drivers/gpu/drm/drm_pagemap_util.c   | 203 ++++++
> > > >  drivers/gpu/drm/xe/Kconfig           |  24 +-
> > > >  drivers/gpu/drm/xe/Makefile          |   2 +-
> > > >  drivers/gpu/drm/xe/xe_bo.c           |  65 +-
> > > >  drivers/gpu/drm/xe/xe_bo.h           |   2 +
> > > >  drivers/gpu/drm/xe/xe_bo_types.h     |   2 +-
> > > >  drivers/gpu/drm/xe/xe_device.c       |   8 +
> > > >  drivers/gpu/drm/xe/xe_device_types.h |  30 +-
> > > >  drivers/gpu/drm/xe/xe_migrate.c      |   8 +-
> > > >  drivers/gpu/drm/xe/xe_pt.c           | 112 ++--
> > > >  drivers/gpu/drm/xe/xe_query.c        |   2 +-
> > > >  drivers/gpu/drm/xe/xe_svm.c          | 716 +++++++++++++++++--
> > > > -
> > > >  drivers/gpu/drm/xe/xe_svm.h          | 158 ++++-
> > > >  drivers/gpu/drm/xe/xe_tile.c         |  20 +-
> > > >  drivers/gpu/drm/xe/xe_tile.h         |  33 +
> > > >  drivers/gpu/drm/xe/xe_vm.c           |   6 +-
> > > >  drivers/gpu/drm/xe/xe_vm_types.h     |   7 +
> > > >  include/drm/drm_gpusvm.h             | 102 +--
> > > >  include/drm/drm_pagemap.h            | 190 +++++-
> > > >  include/drm/drm_pagemap_util.h       |  59 ++
> > > >  include/uapi/drm/xe_drm.h            |  39 ++
> > > >  25 files changed, 2458 insertions(+), 1071 deletions(-)
> > > >  create mode 100644 drivers/gpu/drm/drm_pagemap.c
> > > >  create mode 100644 drivers/gpu/drm/drm_pagemap_util.c
> > > >  create mode 100644 include/drm/drm_pagemap_util.h
> > > > 
> 





[Index of Archives]     [Linux DRI Users]     [Linux Intel Graphics]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [XFree86]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux