On Mon, Mar 5, 2018 at 12:14 PM, Christian König <christian.koenig at amd.com> wrote: > Am 05.03.2018 um 17:58 schrieb Felix Kuehling: >> >> On 2018-03-05 07:17 AM, Christian König wrote: >>> >>> Am 04.03.2018 um 03:34 schrieb Felix Kuehling: >>>> >>>> Signed-off-by: Felix Kuehling <Felix.Kuehling at amd.com> >>>> --- >>>> drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 82 >>>> ++++++++++++++++++++++++++++++++++ >>>> drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h | 1 + >>>> 2 files changed, 83 insertions(+) >>>> >>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c >>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c >>>> index 5afbc5e..58153ef 100644 >>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c >>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c >>>> @@ -2350,6 +2350,8 @@ int amdgpu_vm_init(struct amdgpu_device *adev, >>>> struct amdgpu_vm *vm, >>>> INIT_KFIFO(vm->faults); >>>> vm->fault_credit = 16; >>>> + vm->vm_context = vm_context; >>> >>> I think we should drop the vm_context parameter and all the related >>> code in amdgpu_vm_init(). But that can be part of a later cleanup >>> patch as well. >> >> Yep. It will be a prerequisite for sharing VMs between graphics and >> compute. But then CPU vs SDMA page table update would need to be handled >> differently, on a per-call basis and probably some synchronization >> between them. >> >> Also, we'd need to fix shadow page table handling with CPU updates, or >> get rid of shadow page tables. I'm not sure why they're needed, TBH. > > > The idea behind shadow page tables is that you have the current state of the > pipeline if the GPU crashes. Since with CPU updates there is no pipeline > they doesn't make sense at all with them. They are also to protect against vram loss in the case of a GPU reset. Alex > > >> >>>> + >>>> return 0; >>>> error_free_root: >>>> @@ -2364,6 +2366,86 @@ int amdgpu_vm_init(struct amdgpu_device *adev, >>>> struct amdgpu_vm *vm, >>>> } >>>> /** >>>> + * amdgpu_vm_make_compute - Turn a GFX VM into a compute VM >>>> + * >>>> + * This only works on GFX VMs that don't have any BOs added and no >>>> + * page tables allocated yet. >>>> + * >>>> + * Changes the following VM parameters: >>>> + * - vm_context >>>> + * - use_cpu_for_update >>>> + * - pte_supports_ats >>>> + * - pasid (old PASID is released, because compute manages its own >>>> PASIDs) >>>> + * >>>> + * Reinitializes the page directory to reflect the changed ATS >>>> + * setting. May leave behind an unused shadow BO for the page >>>> + * directory when switching from SDMA updates to CPU updates. >>>> + * >>>> + * Returns 0 for success, -errno for errors. >>>> + */ >>>> +int amdgpu_vm_make_compute(struct amdgpu_device *adev, struct >>>> amdgpu_vm *vm) >>>> +{ >>>> + bool pte_support_ats = (adev->asic_type == CHIP_RAVEN); >>>> + int r; >>>> + >>>> + r = amdgpu_bo_reserve(vm->root.base.bo, true); >>>> + if (r) >>>> + return r; >>>> + >>>> + /* Sanity checks */ >>>> + if (vm->vm_context == AMDGPU_VM_CONTEXT_COMPUTE) { >>>> + /* Can happen if ioctl is interrupted by a signal after >>>> + * this function already completed. Just return success. >>>> + */ >>>> + r = 0; >>>> + goto error; >>>> + } >>> >>> Ok, that is actually a show stopper. An interrupted IOCTL should never >>> have a visible effect. >>> >>> Is that just a theoretical issue or did you run into this in practice? >> >> It's theoretical. init_kfd_vm in patch 4 reserves the page directory >> again, validates it and waits for the validation to complete. Both the >> reservation and the waiting can be interrupted by signals. > > > No, both have parameters to control if they are interruptible or not for > exactly this reason. > >> This happens >> after amdgpu_vm_make_compute has completed. >> >> If we wanted to achieve "no visible effect", we'd need to roll back the >> conversion to a compute VM, which may involve more waiting. >> >> We have a similar issue in map_memory_to_gpu. We map on all GPUs and >> wait for a sync object in the end. If SDMA is used, this allows the SDMA >> on all GPUs to work concurrently. The wait in the end can be >> interrupted. Instead of rolling back the mapping on all GPUs (which >> would require more waiting), we just let the system call return >> -ERESTARTSYS and make sure the restart doesn't cause any trouble. > > > If waits are interruptible or not is controllable by parameters. At least > for importing the VM it is probably ok to disable this. > >> I've been looking for documentation about the expected behaviour for >> system calls interrupted by signals. All I can find is this statement in >> Documentation/kernel-hacking/hacking.rst: >>> >>> After you slept you should check if a signal occurred: the Unix/Linux >>> way of handling signals is to temporarily exit the system call with the >>> ``-ERESTARTSYS`` error. The system call entry code will switch back to >>> user context, process the signal handler and then your system call will >>> be restarted (unless the user disabled that). So you should be prepared >>> to process the restart, e.g. if you're in the middle of manipulating >>> some data structure. >> >> The same paragraph is also copied in >> https://www.kernel.org/doc/htmldocs/kernel-hacking/ioctls.html. >> >> The key sentence is "should be prepared to process the restart, e.g. if >> you're in the middle of manipulating some data structure". Rolling back >> everything would achieve that, but it's not the only way and not the >> most efficient way. >> >> In this case, I'm handling the restart by checking whether the VM is >> already a compute VM. So on the restart the conversion to a compute VM >> becomes a no-op. >> >> Similarly, in the map_memory_to_gpu ioctl, mapping on a GPU where the >> memory is already mapped becomes a no-op, which handles the restart >> case. In this case we even added a specific test for this condition in >> kfdtest, where we deliberately interrupt a big mapping operation with a >> signal. > > > This approach is a clear no-go. > > Changes done before interrupting a system call doesn't necessary need to be > reverted, but only if they don't have a user visible effect. > > For example: When we map something we need to allocate multiple page > directories. It is not necessary to release all previous allocated ones if > an allocation suddenly says that it was interrupted, but only because those > page directories are not user accessible. > > Similar exceptions can be done by avoiding things when the IOCTL was > interrupted. E.g. in the VA IOCTL we usually directly submit updating the > page tables, but if that is interrupted we just return success instead of > restarting the IOCTL and wait with the update until the first command > submission. > > The easiest way to work around this to to add a separate IOCTL which waits > for VM updates to complete. Should be trivial to use vm->last_update for > this. > > Regards, > Christian. > > >> >> Regards, >> Felix >> >>> Regards, >>> Christian. >>> >>>> + if (!RB_EMPTY_ROOT(&vm->va.rb_root) || vm->root.entries) { >>>> + r = -EINVAL; >>>> + goto error; >>>> + } >>>> + >>>> + /* Check if PD needs to be reinitialized and do it before >>>> + * changing any other state, in case it fails. >>>> + */ >>>> + if (pte_support_ats != vm->pte_support_ats) { >>>> + /* TODO: This depends on a patch series by Christian. >>>> + * It's only needed for GFX9 GPUs, which aren't >>>> + * supported by upstream KFD yet. >>>> + r = amdgpu_vm_clear_bo(adev, vm, vm->root.base.bo, >>>> + adev->vm_manager.root_level, >>>> + pte_support_ats); >>>> + if (r) >>>> + goto error; >>>> + */ >>>> + } >>>> + >>>> + /* Update VM state */ >>>> + vm->vm_context = AMDGPU_VM_CONTEXT_COMPUTE; >>>> + vm->use_cpu_for_update = !!(adev->vm_manager.vm_update_mode & >>>> + AMDGPU_VM_USE_CPU_FOR_COMPUTE); >>>> + vm->pte_support_ats = pte_support_ats; >>>> + DRM_DEBUG_DRIVER("VM update mode is %s\n", >>>> + vm->use_cpu_for_update ? "CPU" : "SDMA"); >>>> + WARN_ONCE((vm->use_cpu_for_update & !amdgpu_vm_is_large_bar(adev)), >>>> + "CPU update of VM recommended only for large BAR system\n"); >>>> + >>>> + if (vm->pasid) { >>>> + unsigned long flags; >>>> + >>>> + spin_lock_irqsave(&adev->vm_manager.pasid_lock, flags); >>>> + idr_remove(&adev->vm_manager.pasid_idr, vm->pasid); >>>> + spin_unlock_irqrestore(&adev->vm_manager.pasid_lock, flags); >>>> + >>>> + vm->pasid = 0; >>>> + } >>>> + >>>> +error: >>>> + amdgpu_bo_unreserve(vm->root.base.bo); >>>> + return r; >>>> +} >>>> + >>>> +/** >>>> * amdgpu_vm_free_levels - free PD/PT levels >>>> * >>>> * @adev: amdgpu device structure >>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h >>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h >>>> index 40b4e09..7f50a38 100644 >>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h >>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h >>>> @@ -263,6 +263,7 @@ void amdgpu_vm_manager_init(struct amdgpu_device >>>> *adev); >>>> void amdgpu_vm_manager_fini(struct amdgpu_device *adev); >>>> int amdgpu_vm_init(struct amdgpu_device *adev, struct amdgpu_vm *vm, >>>> int vm_context, unsigned int pasid); >>>> +int amdgpu_vm_make_compute(struct amdgpu_device *adev, struct >>>> amdgpu_vm *vm); >>>> void amdgpu_vm_fini(struct amdgpu_device *adev, struct amdgpu_vm >>>> *vm); >>>> bool amdgpu_vm_pasid_fault_credit(struct amdgpu_device *adev, >>>> unsigned int pasid); > > > _______________________________________________ > amd-gfx mailing list > amd-gfx at lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/amd-gfx