Re: [PATCH] drm/amdgpu: add the checking to avoid NULL pointer dereference

Christian König <ckoenig.leichtzumerken@xxxxxxxxx> · Tue, 27 Nov 2018 10:52:02 +0100

Am 27.11.18 um 01:40 schrieb Deepak Sharma:

On 11/26/18 1:57 AM, Zhou, David(ChunMing) wrote:

-----Original Message-----
From: Christian König <ckoenig.leichtzumerken@xxxxxxxxx>
Sent: Monday, November 26, 2018 5:23 PM
To: Sharma, Deepak <Deepak.Sharma@xxxxxxx>; Zhou, David(ChunMing)
<David1.Zhou@xxxxxxx>; Koenig, Christian <Christian.Koenig@xxxxxxx>;
amd-gfx@xxxxxxxxxxxxxxxxxxxxx
Subject: Re: [PATCH] drm/amdgpu: add the checking to avoid NULL pointer
dereference

Am 26.11.18 um 02:59 schrieb Sharma, Deepak:
在 2018/11/24 2:10, Koenig, Christian 写道:
Am 23.11.18 um 15:10 schrieb Zhou, David(ChunMing):
在 2018/11/23 21:30, Koenig, Christian 写道:
Am 23.11.18 um 14:27 schrieb Zhou, David(ChunMing):
在 2018/11/22 19:25, Christian König 写道:
Am 22.11.18 um 07:56 schrieb Sharma, Deepak:
when returned fence is not valid mostly due to userspace ignored
previous error causes NULL pointer dereference.
Again, this is clearly incorrect. The my other mails on the
earlier patch.
Sorry for I didn't get your history, but looks from the patch
itself, it is still a valid patch, isn't it?
No, the semantic of amdgpu_ctx_get_fence() is that we return NULL
when the fence is already signaled.

So this patch could totally break userspace because it changes the
behavior when we try to sync to an already signaled fence.
Ah, I got your meaning, how about attached patch?
Yeah something like this, but I would just give the
DRM_SYNCOBJ_CREATE_SIGNALED instead.

I mean that's what this flag is good for isn't it?
Yeah, I give a flag initally when creating patch, but as you know, there is a
swich case not be able to use that flag:
        case AMDGPU_FENCE_TO_HANDLE_GET_SYNC_FILE_FD:
            fd = get_unused_fd_flags(O_CLOEXEC);
            if (fd < 0) {
                dma_fence_put(fence);
                return fd;
            }

            sync_file = sync_file_create(fence);
            dma_fence_put(fence);
            if (!sync_file) {
                put_unused_fd(fd);
                return -ENOMEM;
            }

            fd_install(fd, sync_file->file);
            info->out.handle = fd;
            return 0;

So I change to stub fence instead.
Yeah, I've missed that case. Not sure if the sync_file can deal with a NULL
fence.

We should then probably move the stub fence function into
dma_fence_stub.c under drivers/dma-buf to keep the stuff together.
Yes, you wrap it to review first with your stub fence, we can do it separately first.

-David
-David

I have not applied this patch.
The issue was trying to address is when amdgpu_cs_ioctl() failed due to
low memory (ENOMEM) but userspace chose to proceed and called
amdgpu_cs_fence_to_handle_ioctl().
In amdgpu_cs_fence_to_handle_ioctl(), fence is null and later causing
NULL pointer dereference, this patch was to avoid that and system panic
But I understand now that its a valid case retuning NULL if fence was already
signaled but need to handle case so avoid kernel panic. Seems David patch
should fix this, I will test it tomorrow.

Mhm, but don't we bail out with an error if we ask for a failed command
submission? If not that sounds like a bug as well.

Christian.

Where do we do that?
I see error
[drm:amdgpu_cs_ioctl] *ERROR* amdgpu_cs_list_validate(validated) failed.
[drm:amdgpu_cs_ioctl] *ERROR* Not enough memory for command submission!
BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
Did some more debugging,dma_fence_is_array() is causing NULL pointer
dereference call through sync_file_ioctl.

Also I think changes in David patch cant be applied on
amd-staging-drm-next, which all patches I should take to get it correctly?

Mhm, what you say here actually doesn't make much sense.

When the sequence number is invalid because the submission failed 
amdgpu_ctx_get_fence() returns an error:
        if (seq >= centity->sequence) {
                spin_unlock(&ctx->ring_lock);
                return ERR_PTR(-EINVAL);
        }

This error is then handled in amdgpu_cs_fence_to_handle_ioctl():
        fence = amdgpu_cs_get_fence(adev, filp, &info->in.fence);
        if (IS_ERR(fence))
                return PTR_ERR(fence);

So the error handling for this is actually in place.

The only thing that could go wrong is that userspace sends a sequence 
number which is to small.

The correct solution is to either set the flag as I suggested and make 
sync_file_create() capable of handling a NULL fence.

Or we make the stub fence global like David suggested.

Regards,
Christian.


-Deepak
Christian.

-David
If that patch was applied then please revert it immediately.

Christian.

-David
If you have already pushed the patch then please revert.

Christian.

Signed-off-by: Deepak Sharma <Deepak.Sharma@xxxxxxx>
---
         drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 2 ++
         1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index 024dfbd87f11..14166cd8a12f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -1403,6 +1403,8 @@ static struct dma_fence
*amdgpu_cs_get_fence(struct amdgpu_device *adev,
               fence = amdgpu_ctx_get_fence(ctx, entity, user->seq_no);
             amdgpu_ctx_put(ctx);
+    if(!fence)
+        return ERR_PTR(-EINVAL);
               return fence;
         }
_______________________________________________
amd-gfx mailing list
amd-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
_______________________________________________
amd-gfx mailing list
amd-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

_______________________________________________
amd-gfx mailing list
amd-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/amd-gfx