On Wed, 2025-01-29 at 11:52 -0800, Matthew Brost wrote: > Add documentation for agree upon GPU SVM design principles, current > status, and future plans. > > v4: > - Address Thomas's feedback > > Signed-off-by: Matthew Brost <matthew.brost@xxxxxxxxx> > --- > Documentation/gpu/rfc/gpusvm.rst | 84 > ++++++++++++++++++++++++++++++++ > Documentation/gpu/rfc/index.rst | 4 ++ > 2 files changed, 88 insertions(+) > create mode 100644 Documentation/gpu/rfc/gpusvm.rst > > diff --git a/Documentation/gpu/rfc/gpusvm.rst > b/Documentation/gpu/rfc/gpusvm.rst > new file mode 100644 > index 000000000000..2d88f5981981 > --- /dev/null > +++ b/Documentation/gpu/rfc/gpusvm.rst > @@ -0,0 +1,84 @@ > +=============== > +GPU SVM Section > +=============== > + > +Agreed upon design principles > +============================= > + > +* migrate_to_ram path > + * Rely only on core MM concepts (migration PTEs, page > references, and > + page locking). The reasoning is that this is not required, > can lead to > + livelock cases, and is generally not a good idea to seal > races using > + driver-invented locks. > + * No driver specific locks other than locks for hardware > interaction in > + this path. > + * Partial migration is supported (i.e., a subset of pages > attempting to > + migrate can actually migrate, with only the faulting page > guaranteed > + to migrate). > + * Driver handles mixed migrations via retry loops rather > than locking. > +* Eviction > + * Only looking at physical memory data structures and locks > as opposed to > + looking at virtual memory data structures and locks. > + * No looking at mm/vma structs or relying on those being > locked. > +* GPU fault side > + * mmap_read only used around core MM functions which require > this lock > + and should strive to take mmap_read lock only in GPU SVM > layer. > + * Big retry loop to handle all races with the mmu notifier > under the gpu > + pagetable locks/mmu notifier range lock/whatever we end up > calling > + those. > + * Races (especially against concurrent eviction or > migrate_to_ram) > + should not be handled on the fault side by trying to hold > locks; > + rather, they should be handled using retry loops. One > possible > + exception is holding a BO's dma-resv lock during the > initial migration > + to VRAM, as this is a well-defined lock that can be taken > underneath > + the mmap_read lock. > +* Physical memory to virtual backpointer > + * Does not work, no pointers from physical memory to virtual > should > + exist. > + * Physical memory backpointer (page->zone_device_data) > should be stable > + from allocation to page free. > +* GPU pagetable locking > + * Notifier lock only protects range tree, pages valid state > for a range > + (rather than seqno due to wider notifiers), pagetable > entries, and > + mmu notifier seqno tracking, it is not a global lock to > protect > + against races. > + * All races handled with big retry as mentioned above. > + > +Overview of current design > +========================== > + > +Current design is simple as possible to get a working basline in baseline With that fixed, Reviewed-by: Thomas Hellström <thomas.hellstrom@xxxxxxxxxxxxxxx> > which can be > +built upon. > + > +.. kernel-doc:: drivers/gpu/drm/xe/drm_gpusvm.c > + :doc: Overview > + :doc: Locking > + :doc: Migrataion > + :doc: Partial Unmapping of Ranges > + :doc: Examples > + > +Possible future design features > +=============================== > + > +* Concurrent GPU faults > + * CPU faults are concurrent so makes sense to have > concurrent GPU > + faults. > + * Should be possible with fined grained locking in the > driver GPU > + fault handler. > + * No expected GPU SVM changes required. > +* Ranges with mixed system and device pages > + * Can be added if required to drm_gpusvm_get_pages fairly > easily. > +* Multi-GPU support > + * Work in progress and patches expected after initially > landing on GPU > + SVM. > + * Ideally can be done with little to no changes to GPU SVM. > +* Drop ranges in favor of radix tree > + * May be desirable for faster notifiers. > +* Compound device pages > + * Nvidia, AMD, and Intel all have agreed expensive core MM > functions in > + migrate device layer are a performance bottleneck, having > compound > + device pages should help increase performance by reducing > the number > + of these expensive calls. > +* Higher order dma mapping for migration > + * 4k dma mapping adversely affects migration performance on > Intel > + hardware, higher order (2M) dma mapping should help here. > diff --git a/Documentation/gpu/rfc/index.rst > b/Documentation/gpu/rfc/index.rst > index 476719771eef..396e535377fb 100644 > --- a/Documentation/gpu/rfc/index.rst > +++ b/Documentation/gpu/rfc/index.rst > @@ -16,6 +16,10 @@ host such documentation: > * Once the code has landed move all the documentation to the right > places in > the main core, helper or driver sections. > > +.. toctree:: > + > + gpusvm.rst > + > .. toctree:: > > i915_gem_lmem.rst