On 2017å¹´05æ??17æ?¥ 16:40, Christian König wrote: > Am 17.05.2017 um 10:01 schrieb Michel Dänzer: >> On 17/05/17 04:13 PM, zhoucm1 wrote: >>> On 2017å¹´05æ??17æ?¥ 14:57, Michel Dänzer wrote: >>>> On 17/05/17 01:28 PM, zhoucm1 wrote: >>>>> On 2017å¹´05æ??17æ?¥ 11:15, Michel Dänzer wrote: >>>>>> On 17/05/17 12:04 PM, zhoucm1 wrote: >>>>>>> On 2017å¹´05æ??17æ?¥ 09:18, Michel Dänzer wrote: >>>>>>>> On 16/05/17 06:25 PM, Chunming Zhou wrote: >>>>>>>>> Change-Id: I8eb6d7f558da05510e429d3bf1d48c8cec6c1977 >>>>>>>>> Signed-off-by: Chunming Zhou <David1.Zhou at amd.com> >>>>>>>>> >>>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c >>>>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c >>>>>>>>> index bca1fb5..f3e7525 100644 >>>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c >>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c >>>>>>>>> @@ -2547,6 +2547,9 @@ int amdgpu_vm_ioctl(struct drm_device *dev, >>>>>>>>> void *data, struct drm_file *filp) >>>>>>>>> case AMDGPU_VM_OP_UNRESERVE_VMID: >>>>>>>>> amdgpu_vm_free_reserved_vmid(adev, &fpriv->vm, >>>>>>>>> AMDGPU_GFXHUB); >>>>>>>>> break; >>>>>>>>> + case AMDGPU_VM_OP_RESET: >>>>>>>>> + fpriv->vram_lost_counter = >>>>>>>>> atomic_read(&adev->vram_lost_counter); >>>>>>>>> + break; >>>>>>>> How do you envision the UMDs using this? I can mostly think of >>>>>>>> them >>>>>>>> calling this ioctl when a context is created or destroyed. But >>>>>>>> that >>>>>>>> would also allow any other remaining contexts using the same >>>>>>>> DRM file >>>>>>>> descriptor to use all ioctls again. So, I think there needs to >>>>>>>> be a >>>>>>>> vram_lost_counter in struct amdgpu_ctx instead of in struct >>>>>>>> amdgpu_fpriv. >>>>>>> struct amdgpu_fpriv for vram_lost_counter is proper place, >>>>>>> especially >>>>>>> for ioctl return value. >>>>>>> if you need to reset ctx one by one, we can mark all contexts of >>>>>>> that >>>>>>> vm, and then reset by userspace. >>>>>> I'm not following. With vram_lost_counter in amdgpu_fpriv, if any >>>>>> context calls this ioctl, all other contexts using the same file >>>>>> descriptor will also be considered safe again, right? >>>>> Yes, but it really depends on userspace requirement, if you need to >>>>> reset ctx one by one, we can mark all contexts of that vm to >>>>> guilty, and >>>>> then reset one context by userspace. >>>> Still not sure what you mean by that. >>>> >>>> E.g. what do you mean by "guilty"? I thought that refers to the >>>> context >>>> which caused a hang. But it seems like you're using it to refer to any >>>> context which hasn't reacted yet to VRAM contents being lost. >>> When vram is lost, we treat all contexts need to reset. >> Essentially, your patches only track VRAM contents being lost per file >> descriptor, not per context. I'm not sure (rather skeptical) that this >> is suitable for OpenGL UMDs, since state is usually tracked per context. >> Marek / Nicolai? > > Oh, yeah that's a good point. > > The problem with tracking it per context is that Vulkan also wants the > ENODEV on the amdgpu_gem_va_ioct() and amdgpu_info_ioctl() which are > context less. > > But thinking more about this blocking those two doesn't make much > sense. The VM content can be restored and why should be disallow > reading GPU info? I can re-paste the Vulkan APIs requiring ENODEV: " The Vulkan APIs listed below could return VK_ERROR_DEVICE_LOST according to the spec. I tries to provide a list of u/k interfaces that could be called for each vk API. vkCreateDevice -amdgpu_device_initialize. -amdgpu_query_gpu_info vkQueueSubmit -amdgpu_cs_submit vkWaitForFences amdgpu_cs_wait_fences vkGetEventStatus vkQueueWaitIdle vkDeviceWaitIdle vkGetQueryPoolResults** amdgpu_cs_query_Fence_status vkQueueBindSparse** amdgpu_bo_va_op amdgpu_bo_va_op_raw vkCreateSwapchainKHR** vkAcquireNextImageKHR** vkQueuePresentKHR Not related with u/k interface.** ** Besides those listed above, I think amdgpu_cs_signal_Sem/amdgpu_cs_wait_sem should respond to gpu reset as well." > > Christian. > -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20170517/3f3907c7/attachment.html>