On Tue, 2024-10-15 at 20:25 -0700, Matthew Brost wrote: > Add documentation for agree upon GPU SVM design principles, current > status, and future plans. > > Signed-off-by: Matthew Brost <matthew.brost@xxxxxxxxx> > --- > Documentation/gpu/rfc/gpusvm.rst | 70 > ++++++++++++++++++++++++++++++++ > Documentation/gpu/rfc/index.rst | 4 ++ > 2 files changed, 74 insertions(+) > create mode 100644 Documentation/gpu/rfc/gpusvm.rst > > diff --git a/Documentation/gpu/rfc/gpusvm.rst > b/Documentation/gpu/rfc/gpusvm.rst > new file mode 100644 > index 000000000000..2d3f79a6c30a > --- /dev/null > +++ b/Documentation/gpu/rfc/gpusvm.rst > @@ -0,0 +1,70 @@ > +=============== > +GPU SVM Section > +=============== > + > +Agreed upon design principles > +============================= > + > +* migrate_to_ram path > + * Rely on core MM concepts (migration ptes, page refs, and > page locking) > + only > + * No driver specific locks other than locks for hardware > interaction in > + this path We have previously been discussing the bo lock to protect the bo from eviction during migrate, if the vram allocation is bo-based. This is a cross-driver lock with a well-established locking order and I suggest we allow this. Apart from that I think the above statement needs some elaboration: What is the problem we are trying to avoid with driver- specific locks, written so that it's easy to understand it's a bad idea. > + * Partial migration is supported Exactly what do we mean by partial migration. > + * Driver handles mixed migrations via retry loops rather > than locking > +* Eviction > + * Only looking at physical memory datastructures and locks as opposed to... > + * No looking at mm/vma structs or relying on those being > locked We're violating this with the current implementation, aren't we? > +* GPU fault side > + * mmap_read only used around core MM functions which require > this lock > + * Big retry loop to handle all races with the mmu notifier > under the gpu > + pagetable locks/mmu notifier range lock/whatever we end up > calling > + those > + * Races (especially against concurrent > eviction/migrate_to_ram) should > + not be handled on the fault side by trying to hold locks This actually contradicts my comment written above about using the bo lock to block eviction here. The alternative would be to pin vram allocations during migration until the mm_truct has references on the allocation, but it'd be good to clarify exactly why locking here is a bad idea, and why we can't rely on lockdep? > +* Physical memory to virtual backpointer > + * Does not work, no pointers from physical memory to virtual > should > + exist We actually still have the private zdd structure, but it's strictly not to virtual but to the allocation metadata. Is it verified that the zone_device_data field is allowed to be modified by the pagemap between allocation and migration? > +* GPU pagetable locking > + * Notifier lock only protects range tree, pages, pagetable > entries, and > + mmu notifier seqno tracking, it is not a global lock to > protect > + against races > + * All races handled with big retry as mentioned above Adding a note here about "pages valid" for subranges rather than relying on the wider notifer seqno. I.E. a subrange can be valid even if the notifier seqno says otherwise. Performance considerations: Perhaps mention that notifier (core mm) performance is more important than gpu fault (driver) performance when considering optimizations that improves one at the cost of the other? > + > +Overview of current design > +========================== > + > +Current design is simple as possible to get a working basline in > which can built can be built > +upon. > + > +.. kernel-doc:: drivers/gpu/drm/xe/drm_gpusvm.c > + :doc: Overview > + :doc: Locking > + :doc: Migrataion > + :doc: Partial Unmapping of Ranges > + :doc: Examples > + > +Possible future design features > +=============================== > + > +* Concurrent GPU faults > + * CPU faults are concurrent so makes sense to have > concurrent GPU faults > + * Should be possible with fined grained locking in the > driver GPU > + fault handler > + * No expected GPU SVM changes required > +* Ranges with mixed system and device pages > + * Can be added if required to drm_gpusvm_get_pages fairly > easily > +* Multi-GPU support > + * Work in progress and patches expected after initially > landing on GPU > + SVM > + * Ideally can be done with little to no changes to GPU SVM > +* Drop ranges in favor of radix tree > + * May be desirable for faster notifiers > +* Compound device pages > + * Nvidia, AMD, and Intel all have agreed expensive core MM > functions in > + migrate device layer are a performance bottleneck, having > compound > + device pages should help increase performance by reducing > the number > + of these expensive calls > +* Higher order dma mapping for migration > + * 4k dma mapping adversely affects migration performance on > Intel > + hardware, higher order (2M) dma mapping should help here > diff --git a/Documentation/gpu/rfc/index.rst > b/Documentation/gpu/rfc/index.rst > index 476719771eef..396e535377fb 100644 > --- a/Documentation/gpu/rfc/index.rst > +++ b/Documentation/gpu/rfc/index.rst > @@ -16,6 +16,10 @@ host such documentation: > * Once the code has landed move all the documentation to the right > places in > the main core, helper or driver sections. > > +.. toctree:: > + > + gpusvm.rst > + > .. toctree:: > > i915_gem_lmem.rst Thanks, Thomas