Re: [PATCH 3/3] drm/amdgpu: replace get_user_pages with HMM address mirror helpers v6

Christian König <ckoenig.leichtzumerken@xxxxxxxxx> · Wed, 6 Feb 2019 10:01:31 +0100

Am 05.02.19 um 22:56 schrieb Yang, Philip:
Hi Christian,

I will submit new patch for review, my comments embedded inline below.

Thanks,
Philip

On 2019-02-05 1:09 p.m., Koenig, Christian wrote:
Am 05.02.19 um 18:25 schrieb Yang, Philip:
[SNIP]+
+    if (r == -ERESTARTSYS) {
+        if (!--tries) {
+            DRM_ERROR("Possible deadlock? Retry too many times\n");
+            return -EDEADLK;
+        }
+        goto restart;
You really need to restart the IOCTL to potentially handle signals.

But since the calling code correctly handles this already you can just
drop this change as far as I can see.

I agree that we should return -ERESTARTSYS to upper layer to handle signals.

But I do not find upper layers handle -ERESTARTSYS in the entire calling
path, ksys_ioctl -> do_vfs_ioctl -> amdgpu_drm_ioctl -> drm->ioctl ->
drm_ioctl_kernel -> amdgpu_cs_ioctl. The error code returns to
application. I confirm it, libdrm userptr test application calling
amdgpu_cs_ioctl return code is -512, which is -ERESTARTSYS.

So application should handle -ERESTARTSYS to restart the ioctl, but
libdrm userptr test application doesn't handle this. This causes the
test failed.
This is a bug in the test cases then.

-ERESTARTSYS can happen at any time during interruptible waiting and it
is mandatory for the upper layer to handle it correctly.

-ERESTARTSYS can be returned only when signal is pending, signal handler
will translate ERESTARTSYS to EINTR, drmIoctl in libdrm does handle
EINTR and restart the ioctl. The test cases are ok.

Interesting, I thought that this was handled unconditionally if a signal 
is pending or not. But that could as well be architecture specific.

Driver fail path should not return ERESTARTSYS to user space. The new
patch, I change amdgpu_cs_submit to return -EAGAIN if userptr is
updated, and amdgpu_cs_ioctl redo the ioctl only if error code is
-EAGAIN. ERESTARTSYS error code returns to user space for signal handle
as before.

Good idea, that is probably cleaner. EGAIN and EINTR should have the 
same effect in drmIoctl IIRC.

Below are details of userptr path difference. For the previous path,
libdrm test always goes to step 2, step 3 never trigger. So it never
return -ERESTARTSYS, but in theory, this could happen.

For HMM path, the test always goes to step 3, we have to handle this
case inside amdgpu_cs_ioctl. Maybe I can change amdgpu_cs_submit to
return -EBUSY, then restart the ioctl inside amdgpu_cs_ioctl. I will
submit new patch.
Clearly a NAK, this won't work correctly.

I don't understand your concern, may you explain the reason?

When some underlying code returns ERESTARTSYS we *MUST* forward that to 
the upper layer.

Handling it in a loop or replacing it manually with EINTR or EGAIN will 
break signal handling.

This is suggested quite often, but can end up in an endless loop inside 
the kernel which is quite ugly.

Regards,
Christian.
_______________________________________________
amd-gfx mailing list
amd-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/amd-gfx