Re: [RFC PATCH 05/28] drm/gpusvm: Add support for GPU Shared Virtual Memory

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi, Matt

On Fri, 2024-08-30 at 13:58 +0000, Matthew Brost wrote:
> > 
> > So I specifically asked Jason about the performance problem about
> > using
> > many notifiers vs using a single one, and he responded that the
> > problem
> > is slowing down the core mm on invalidations, if the RB tree gets
> > too
> > large to walk. He also mentioned that we should consider core
> > invalidation performance before faulting performance because the
> > latter
> > is so slow anyway we must have the driver stack avoid gpu faults
> > using
> > user-space prefetching and similar techniques.
> > 
> > In particular inserting and removing into the mmu_interval tree is
> > not
> > costly in terms of locking but because of correctness requirements
> > insertion might block on ongoing validations.
> > 
> > So basically what I'm trying to say is that as long as we're using
> > SVM
> > ranges in the way we do (I'm not saying that is wrong at this
> > point,
> 
> If you have been following the mmap write discussions at all, one
> potential fix for removing that hack is a per range migrate mutex
> [1].
> This also need to be considered when / if we try to drop a raneg
> concept.

Still need to read up on that, and for migration I think the situation
is a bit different, pls see below.

> 
> [1]
> https://patchwork.freedesktop.org/patch/610957/?series=137870&rev=1#comment_1111296
> 
> > and I agree that could be fine-tuned later), The benefit of an
> > extra
> > notifier layer is questionable compared to directly inserting the
> > ranges into the mmu_interval_tree. So hence my questions, given
> > those
> > considerations why this additional layer?
> > 
> 
> One we do fairly easily if you think this questionable is have an
> option
> to size the notifier to range size and wire this the notifier size
> modparam [2]. Again once we have apps running it would be fairly to
> profile this and see if there is benefit to this large notifier
> scheme.
> If there really is none, perhaps then we consider ripping this out.
> 
> [2]
> https://patchwork.freedesktop.org/patch/611007/?series=137870&rev=1
> 
> Matt

At this point I'm mostly trying to understand the reasoning behind the
various design choices and why data structures look like they do.

But also considering that the page-table mapping and invalidation is
per (vm, gpu_vm) pair and migration is per (vm, device (device group))
pair,I have really been advocating for sorting out the page-table
mapping and invalidation first and end up with something that is
lightweight and sufficient for igpu systems, and to avoid conflating
possible page-table range requirements with migration range
requirements which might be completely different. 

I think the former can be done completely without ranges, having
configurable prefaulting-, invalidation- and notifier granularity,
whereas the latter also introduces migration granularity.

/Thomas








[Index of Archives]     [Linux DRI Users]     [Linux Intel Graphics]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [XFree86]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux