On 2017å¹´05æ??17æ?¥ 14:57, Michel Dänzer wrote: > On 17/05/17 01:28 PM, zhoucm1 wrote: >> On 2017å¹´05æ??17æ?¥ 11:15, Michel Dänzer wrote: >>> On 17/05/17 12:04 PM, zhoucm1 wrote: >>>> On 2017å¹´05æ??17æ?¥ 09:18, Michel Dänzer wrote: >>>>> On 16/05/17 06:25 PM, Chunming Zhou wrote: >>>>>> Change-Id: I8eb6d7f558da05510e429d3bf1d48c8cec6c1977 >>>>>> Signed-off-by: Chunming Zhou <David1.Zhou at amd.com> >>>>>> >>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c >>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c >>>>>> index bca1fb5..f3e7525 100644 >>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c >>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c >>>>>> @@ -2547,6 +2547,9 @@ int amdgpu_vm_ioctl(struct drm_device *dev, >>>>>> void *data, struct drm_file *filp) >>>>>> case AMDGPU_VM_OP_UNRESERVE_VMID: >>>>>> amdgpu_vm_free_reserved_vmid(adev, &fpriv->vm, >>>>>> AMDGPU_GFXHUB); >>>>>> break; >>>>>> + case AMDGPU_VM_OP_RESET: >>>>>> + fpriv->vram_lost_counter = >>>>>> atomic_read(&adev->vram_lost_counter); >>>>>> + break; >>>>> How do you envision the UMDs using this? I can mostly think of them >>>>> calling this ioctl when a context is created or destroyed. But that >>>>> would also allow any other remaining contexts using the same DRM file >>>>> descriptor to use all ioctls again. So, I think there needs to be a >>>>> vram_lost_counter in struct amdgpu_ctx instead of in struct >>>>> amdgpu_fpriv. >>>> struct amdgpu_fpriv for vram_lost_counter is proper place, especially >>>> for ioctl return value. >>>> if you need to reset ctx one by one, we can mark all contexts of that >>>> vm, and then reset by userspace. >>> I'm not following. With vram_lost_counter in amdgpu_fpriv, if any >>> context calls this ioctl, all other contexts using the same file >>> descriptor will also be considered safe again, right? >> Yes, but it really depends on userspace requirement, if you need to >> reset ctx one by one, we can mark all contexts of that vm to guilty, and >> then reset one context by userspace. > Still not sure what you mean by that. > > E.g. what do you mean by "guilty"? I thought that refers to the context > which caused a hang. But it seems like you're using it to refer to any > context which hasn't reacted yet to VRAM contents being lost. When vram is lost, we treat all contexts need to reset. Regards, David Zhou > > Also not sure what you mean by "if you need to reset ctx one by one". > >