RE: Making drm_gpuvm work across gpu devices

"Zeng, Oak" <oak.zeng@xxxxxxxxx> · Thu, 25 Jan 2024 05:25:28 +0000

Hi Dave,

Let me step back. When I wrote " shared virtual address space b/t cpu and all gpu devices is a hard requirement for our system allocator design", I meant this is not only Intel's design requirement. Rather this is a common requirement for both Intel, AMD and Nvidia. Take a look at cuda driver API definition of cuMemAllocManaged (search this API on https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MEM.html#group__CUDA__MEM), it said: 

"The pointer is valid on the CPU and on all GPUs in the system that support managed memory."

This means the program virtual address space is shared b/t CPU and all GPU devices on the system. The system allocator we are discussing is just one step advanced than cuMemAllocManaged: it allows malloc'ed memory to be shared b/t CPU and all GPU devices.

I hope we all agree with this point.

With that, I agree with Christian that in kmd we should make driver code per-device based instead of managing all devices in one driver instance. Our system allocator (and generally xekmd)design follows this rule: we make xe_vm per device based - one device is *not* aware of other device's address space, as I explained in previous email. I started this email seeking a one drm_gpuvm instance to cover all GPU devices. I gave up this approach (at least for now) per Danilo and Christian's feedback: We will continue to have per device based drm_gpuvm. I hope this is aligned with Christian but I will have to wait for Christian's reply to my previous email.

I hope this clarify thing a little.

Regards,
Oak 

> -----Original Message-----
> From: dri-devel <dri-devel-bounces@xxxxxxxxxxxxxxxxxxxxx> On Behalf Of David
> Airlie
> Sent: Wednesday, January 24, 2024 8:25 PM
> To: Zeng, Oak <oak.zeng@xxxxxxxxx>
> Cc: Ghimiray, Himal Prasad <himal.prasad.ghimiray@xxxxxxxxx>;
> Thomas.Hellstrom@xxxxxxxxxxxxxxx; Winiarski, Michal
> <michal.winiarski@xxxxxxxxx>; Felix Kuehling <felix.kuehling@xxxxxxx>; Welty,
> Brian <brian.welty@xxxxxxxxx>; Shah, Ankur N <ankur.n.shah@xxxxxxxxx>; dri-
> devel@xxxxxxxxxxxxxxxxxxxxx; intel-xe@xxxxxxxxxxxxxxxxxxxxx; Gupta, saurabhg
> <saurabhg.gupta@xxxxxxxxx>; Danilo Krummrich <dakr@xxxxxxxxxx>; Daniel
> Vetter <daniel@xxxxxxxx>; Brost, Matthew <matthew.brost@xxxxxxxxx>; Bommu,
> Krishnaiah <krishnaiah.bommu@xxxxxxxxx>; Vishwanathapura, Niranjana
> <niranjana.vishwanathapura@xxxxxxxxx>; Christian König
> <christian.koenig@xxxxxxx>
> Subject: Re: Making drm_gpuvm work across gpu devices
> 
> >
> >
> > For us, Xekmd doesn't need to know it is running under bare metal or
> virtualized environment. Xekmd is always a guest driver. All the virtual address
> used in xekmd is guest virtual address. For SVM, we require all the VF devices
> share one single shared address space with guest CPU program. So all the design
> works in bare metal environment can automatically work under virtualized
> environment. +@Shah, Ankur N +@Winiarski, Michal to backup me if I am wrong.
> >
> >
> >
> > Again, shared virtual address space b/t cpu and all gpu devices is a hard
> requirement for our system allocator design (which means malloc’ed memory,
> cpu stack variables, globals can be directly used in gpu program. Same
> requirement as kfd SVM design). This was aligned with our user space software
> stack.
> 
> Just to make a very general point here (I'm hoping you listen to
> Christian a bit more and hoping he replies in more detail), but just
> because you have a system allocator design done, it doesn't in any way
> enforce the requirements on the kernel driver to accept that design.
> Bad system design should be pushed back on, not enforced in
> implementation stages. It's a trap Intel falls into regularly since
> they say well we already agreed this design with the userspace team
> and we can't change it now. This isn't acceptable. Design includes
> upstream discussion and feedback, if you say misdesigned the system
> allocator (and I'm not saying you definitely have), and this is
> pushing back on that, then you have to go fix your system
> architecture.
> 
> KFD was an experiment like this, I pushed back on AMD at the start
> saying it was likely a bad plan, we let it go and got a lot of
> experience in why it was a bad design.
> 
> Dave.