On 17/05/17 05:46 PM, zhoucm1 wrote: > > > On 2017å¹´05æ??17æ?¥ 16:40, Christian König wrote: >> Am 17.05.2017 um 10:01 schrieb Michel Dänzer: >>> On 17/05/17 04:13 PM, zhoucm1 wrote: >>>> On 2017å¹´05æ??17æ?¥ 14:57, Michel Dänzer wrote: >>>>> On 17/05/17 01:28 PM, zhoucm1 wrote: >>>>>> On 2017å¹´05æ??17æ?¥ 11:15, Michel Dänzer wrote: >>>>>>> On 17/05/17 12:04 PM, zhoucm1 wrote: >>>>>>>> On 2017å¹´05æ??17æ?¥ 09:18, Michel Dänzer wrote: >>>>>>>>> On 16/05/17 06:25 PM, Chunming Zhou wrote: >>>>>>>>>> Change-Id: I8eb6d7f558da05510e429d3bf1d48c8cec6c1977 >>>>>>>>>> Signed-off-by: Chunming Zhou <David1.Zhou at amd.com> >>>>>>>>>> >>>>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c >>>>>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c >>>>>>>>>> index bca1fb5..f3e7525 100644 >>>>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c >>>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c >>>>>>>>>> @@ -2547,6 +2547,9 @@ int amdgpu_vm_ioctl(struct drm_device *dev, >>>>>>>>>> void *data, struct drm_file *filp) >>>>>>>>>> case AMDGPU_VM_OP_UNRESERVE_VMID: >>>>>>>>>> amdgpu_vm_free_reserved_vmid(adev, &fpriv->vm, >>>>>>>>>> AMDGPU_GFXHUB); >>>>>>>>>> break; >>>>>>>>>> + case AMDGPU_VM_OP_RESET: >>>>>>>>>> + fpriv->vram_lost_counter = >>>>>>>>>> atomic_read(&adev->vram_lost_counter); >>>>>>>>>> + break; >>>>>>>>> How do you envision the UMDs using this? I can mostly think of >>>>>>>>> them >>>>>>>>> calling this ioctl when a context is created or destroyed. But >>>>>>>>> that >>>>>>>>> would also allow any other remaining contexts using the same >>>>>>>>> DRM file >>>>>>>>> descriptor to use all ioctls again. So, I think there needs to >>>>>>>>> be a >>>>>>>>> vram_lost_counter in struct amdgpu_ctx instead of in struct >>>>>>>>> amdgpu_fpriv. >>>>>>>> struct amdgpu_fpriv for vram_lost_counter is proper place, >>>>>>>> especially >>>>>>>> for ioctl return value. >>>>>>>> if you need to reset ctx one by one, we can mark all contexts of >>>>>>>> that >>>>>>>> vm, and then reset by userspace. >>>>>>> I'm not following. With vram_lost_counter in amdgpu_fpriv, if any >>>>>>> context calls this ioctl, all other contexts using the same file >>>>>>> descriptor will also be considered safe again, right? >>>>>> Yes, but it really depends on userspace requirement, if you need to >>>>>> reset ctx one by one, we can mark all contexts of that vm to >>>>>> guilty, and >>>>>> then reset one context by userspace. >>>>> Still not sure what you mean by that. >>>>> >>>>> E.g. what do you mean by "guilty"? I thought that refers to the >>>>> context >>>>> which caused a hang. But it seems like you're using it to refer to any >>>>> context which hasn't reacted yet to VRAM contents being lost. >>>> When vram is lost, we treat all contexts need to reset. >>> Essentially, your patches only track VRAM contents being lost per file >>> descriptor, not per context. I'm not sure (rather skeptical) that this >>> is suitable for OpenGL UMDs, since state is usually tracked per context. >>> Marek / Nicolai? >> >> Oh, yeah that's a good point. >> >> The problem with tracking it per context is that Vulkan also wants the >> ENODEV on the amdgpu_gem_va_ioct() and amdgpu_info_ioctl() which are >> context less. >> >> But thinking more about this blocking those two doesn't make much >> sense. The VM content can be restored and why should be disallow >> reading GPU info? > I can re-paste the Vulkan APIs requiring ENODEV: > " > > The Vulkan APIs listed below could return VK_ERROR_DEVICE_LOST according > to the spec. > > I tries to provide a list of u/k interfaces that could be called for > each vk API. > > > > vkCreateDevice > > - amdgpu_device_initialize. > - amdgpu_query_gpu_info [...] > vkQueueBindSparse** > > amdgpu_bo_va_op > amdgpu_bo_va_op_raw So these *can* return VK_ERROR_DEVICE_LOST, but do they *have to* do that after a reset? :) -- Earthling Michel Dänzer | http://www.amd.com Libre software enthusiast | Mesa and X developer