On 2018-03-02 01:07 PM, Christian König wrote: > Am 02.03.2018 um 17:38 schrieb Felix Kuehling: >> On 2018-03-02 02:43 AM, Christian König wrote: >>> Hi Felix, >>> >>> patch #3 looks good to me in general, but why do you need to cache the >>> pd_phys_addr? >>> >>> The BO already does that anyway, so you can just use >>> amdgpu_bo_gpu_addr() for this which also makes some additional checks >>> on debug builds. >> Do you mean amdgpu_bo_gpu_offset? That one requires the reservation to >> be locked unless it's pinned. > > Correct, well that's why I suggested it. > >> We are caching the PD physical address here to get around a tricky >> circular locking issue we ran into under memory pressure with evictions. >> I don't remember all the details, but I do remember that the deadlock >> involved fences, which aren't tracked by the lock dependency checker. So >> it was particularly tricky to nail down. Avoiding the need to lock the >> page directory reservation for finding out its physical address breaks >> the lock cycle. > > OK, so what you do is you get the pd address after you validated the > BOs and cache it until you need it in the hardware setup? Correct. There is a KFD-KGD interface that queries the PD address from the VM. The address gets put into the MAP_PROCESS packet in the HWS runlist. After that the runlist effectively caches the same address until an eviction causes a preemption. > > If yes than that would be valid because we do exactly the same in the > job during command submission. > >>> patch #5 well mixing power management into the VM functions is a clear >>> NAK. >> This part won't be in my initial upstream push for GPUVM. >> >>> That certainly doesn't belong there, but we can have a counter how >>> many compute VMs we have in the manager. amdgpu_vm_make_compute() can >>> then return if this was the first VM to became a compute VM or not. >> We currently trigger compute power profile switching based on the >> existence of compute VMs. It was a convenient hook where amdgpu finds >> out about the existence of compute work. If that's not acceptable we can >> look into moving it elsewhere, possibly using a new KFD2KGD interface. > > As I said the VM code can still count how many compute VM there are, > the issue is it should not make power management decisions based on that. > > The caller which triggers the switch of the VM to be a compute VM > should make that decision. Makes sense. Regards,  Felix > > Christian. > >> >>> The rest of the patch looks good to me. >> Thanks, >>   Felix >> >>> Regards, >>> Christian. >>> >>> Am 01.03.2018 um 23:58 schrieb Felix Kuehling: >>>> Hi Christian, >>>> >>>> I have a working patch series against amd-kfg-staging that lets KFD >>>> use >>>> VMs from render node FDs, as we discussed. There are two patches in >>>> that >>>> series that touch amdgpu_vm.[ch] that I'd like your feedback on >>>> before I >>>> commit the changes to amd-kfd-staging and include them in my upstream >>>> patch series for KFD GPUVM support. See attached. >>>> >>>> Thanks, >>>>    Felix >>>> >