RE: Making drm_gpuvm work across gpu devices

"Zeng, Oak" <oak.zeng@xxxxxxxxx> · Wed, 31 Jan 2024 20:59:32 +0000



Fixed one typo: GPU VA != GPU VA should be GPU VA can != CPU VA

> -----Original Message-----
> From: Zeng, Oak
> Sent: Wednesday, January 31, 2024 3:17 PM
> To: Daniel Vetter <daniel@xxxxxxxx>; David Airlie <airlied@xxxxxxxxxx>
> Cc: Christian König <christian.koenig@xxxxxxx>; Thomas Hellström
> <thomas.hellstrom@xxxxxxxxxxxxxxx>; Brost, Matthew
> <matthew.brost@xxxxxxxxx>; Felix Kuehling <felix.kuehling@xxxxxxx>; Welty,
> Brian <brian.welty@xxxxxxxxx>; dri-devel@xxxxxxxxxxxxxxxxxxxxx; Ghimiray, Himal
> Prasad <himal.prasad.ghimiray@xxxxxxxxx>; Bommu, Krishnaiah
> <krishnaiah.bommu@xxxxxxxxx>; Gupta, saurabhg <saurabhg.gupta@xxxxxxxxx>;
> Vishwanathapura, Niranjana <niranjana.vishwanathapura@xxxxxxxxx>; intel-
> xe@xxxxxxxxxxxxxxxxxxxxx; Danilo Krummrich <dakr@xxxxxxxxxx>; Shah, Ankur N
> <ankur.n.shah@xxxxxxxxx>; jglisse@xxxxxxxxxx; rcampbell@xxxxxxxxxx;
> apopple@xxxxxxxxxx
> Subject: RE: Making drm_gpuvm work across gpu devices
> 
> Hi Sima, Dave,
> 
> I am well aware nouveau driver is not what Nvidia do with their customer. The
> key argument is, can we move forward with the concept shared virtual address
> space b/t CPU and GPU? This is the foundation of HMM. We already have split
> address space support with other driver API. SVM, from its name, it means
> shared address space. Are we allowed to implement another driver model to
> allow SVM work, along with other APIs supporting split address space? Those two
> scheme can co-exist in harmony. We actually have real use cases to use both
> models in one application.
> 
> Hi Christian, Thomas,
> 
> In your scheme, GPU VA can != CPU VA. This does introduce some flexibility. But
> this scheme alone doesn't solve the problem of the proxy process/para-
> virtualization. You will still need a second mechanism to partition GPU VA space
> b/t guest process1 and guest process2 because proxy process (or the host
> hypervisor whatever you call it) use one single gpu page table for all the
> guest/client processes. GPU VA for different guest process can't overlap. If this
> second mechanism exist, we of course can use the same mechanism to partition
> CPU VA space between guest processes as well, then we can still use shared VA
> b/t CPU and GPU inside one process, but process1 and process2's address space
> (for both cpu and gpu) doesn't overlap. This second mechanism is the key to
> solve the proxy process problem, not the flexibility you introduced.
> 
> In practice, your scheme also have a risk of running out of process space because
> you have to partition whole address space b/t processes. Apparently allowing
> each guest process to own the whole process space and using separate GPU/CPU
> page table for different processes is a better solution than using single page table
> and partition process space b/t processes.
> 
> For Intel GPU, para-virtualization (xenGT, see https://github.com/intel/XenGT-
> Preview-kernel. It is similar idea of the proxy process in Flex's email. They are all
> SW-based GPU virtualization technology) is an old project. It is now replaced with
> HW accelerated SRIOV/system virtualization. XenGT is abandoned long time ago.
> So agreed your scheme add some flexibility. The question is, do we have a valid
> use case to use such flexibility? I don't see a single one ATM.
> 
> I also pictured into how to implement your scheme. You basically rejected the
> very foundation of hmm design which is shared address space b/t CPU and GPU.
> In your scheme, GPU VA = CPU VA + offset. In every single place where driver
> need to call hmm facilities such as hmm_range_fault, migrate_vma_setup and in
> mmu notifier call back, you need to offset the GPU VA to get a CPU VA. From
> application writer's perspective, whenever he want to use a CPU pointer in his
> GPU program, he add to add that offset. Do you think this is awkward?
> 
> Finally, to implement SVM, we need to implement some memory hint API which
> applies to a virtual address range across all GPU devices. For example, user would
> say, for this virtual address range, I prefer the backing store memory to be on
> GPU deviceX (because user knows deviceX would use this address range much
> more than other GPU devices or CPU). It doesn't make sense to me to make such
> API per device based. For example, if you tell device A that the preferred
> memory location is device B memory, this doesn't sounds correct to me because
> in your scheme, device A is not even aware of the existence of device B. right?
> 
> Regards,
> Oak
> > -----Original Message-----
> > From: Daniel Vetter <daniel@xxxxxxxx>
> > Sent: Wednesday, January 31, 2024 4:15 AM
> > To: David Airlie <airlied@xxxxxxxxxx>
> > Cc: Zeng, Oak <oak.zeng@xxxxxxxxx>; Christian König
> > <christian.koenig@xxxxxxx>; Thomas Hellström
> > <thomas.hellstrom@xxxxxxxxxxxxxxx>; Daniel Vetter <daniel@xxxxxxxx>; Brost,
> > Matthew <matthew.brost@xxxxxxxxx>; Felix Kuehling
> > <felix.kuehling@xxxxxxx>; Welty, Brian <brian.welty@xxxxxxxxx>; dri-
> > devel@xxxxxxxxxxxxxxxxxxxxx; Ghimiray, Himal Prasad
> > <himal.prasad.ghimiray@xxxxxxxxx>; Bommu, Krishnaiah
> > <krishnaiah.bommu@xxxxxxxxx>; Gupta, saurabhg
> <saurabhg.gupta@xxxxxxxxx>;
> > Vishwanathapura, Niranjana <niranjana.vishwanathapura@xxxxxxxxx>; intel-
> > xe@xxxxxxxxxxxxxxxxxxxxx; Danilo Krummrich <dakr@xxxxxxxxxx>; Shah, Ankur
> N
> > <ankur.n.shah@xxxxxxxxx>; jglisse@xxxxxxxxxx; rcampbell@xxxxxxxxxx;
> > apopple@xxxxxxxxxx
> > Subject: Re: Making drm_gpuvm work across gpu devices
> >
> > On Wed, Jan 31, 2024 at 09:12:39AM +1000, David Airlie wrote:
> > > On Wed, Jan 31, 2024 at 8:29 AM Zeng, Oak <oak.zeng@xxxxxxxxx> wrote:
> > > >
> > > > Hi Christian,
> > > >
> > > >
> > > >
> > > > Nvidia Nouveau driver uses exactly the same concept of SVM with HMM,
> > GPU address in the same process is exactly the same with CPU virtual address.
> It
> > is already in upstream Linux kernel. We Intel just follow the same direction for
> > our customers. Why we are not allowed?
> > >
> > >
> > > Oak, this isn't how upstream works, you don't get to appeal to
> > > customer or internal design. nouveau isn't "NVIDIA"'s and it certainly
> > > isn't something NVIDIA would ever suggest for their customers. We also
> > > likely wouldn't just accept NVIDIA's current solution upstream without
> > > some serious discussions. The implementation in nouveau was more of a
> > > sample HMM use case rather than a serious implementation. I suspect if
> > > we do get down the road of making nouveau an actual compute driver for
> > > SVM etc then it would have to severely change.
> >
> > Yeah on the nouveau hmm code specifically my gut feeling impression is
> > that we didn't really make friends with that among core kernel
> > maintainers. It's a bit too much just a tech demo to be able to merge the
> > hmm core apis for nvidia's out-of-tree driver.
> >
> > Also, a few years of learning and experience gaining happened meanwhile -
> > you always have to look at an api design in the context of when it was
> > designed, and that context changes all the time.
> >
> > Cheers, Sima
> > --
> > Daniel Vetter
> > Software Engineer, Intel Corporation
> > http://blog.ffwll.ch