-----Original Message----- From: amd-gfx <amd-gfx-bounces@xxxxxxxxxxxxxxxxxxxxx> On Behalf Of Christian König Sent: Tuesday, November 27, 2018 1:52 AM To: Sharma, Deepak <Deepak.Sharma@xxxxxxx>; Zhou, David(ChunMing) <David1.Zhou@xxxxxxx>; Koenig, Christian <Christian.Koenig@xxxxxxx>; Sharma, Deepak <Deepak.Sharma@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx Subject: Re: [PATCH] drm/amdgpu: add the checking to avoid NULL pointer dereference Am 27.11.18 um 01:40 schrieb Deepak Sharma: > > On 11/26/18 1:57 AM, Zhou, David(ChunMing) wrote: >> >>> -----Original Message----- >>> From: Christian König <ckoenig.leichtzumerken@xxxxxxxxx> >>> Sent: Monday, November 26, 2018 5:23 PM >>> To: Sharma, Deepak <Deepak.Sharma@xxxxxxx>; Zhou, David(ChunMing) >>> <David1.Zhou@xxxxxxx>; Koenig, Christian <Christian.Koenig@xxxxxxx>; >>> amd-gfx@xxxxxxxxxxxxxxxxxxxxx >>> Subject: Re: [PATCH] drm/amdgpu: add the checking to avoid NULL pointer >>> dereference >>> >>> Am 26.11.18 um 02:59 schrieb Sharma, Deepak: >>>> 在 2018/11/24 2:10, Koenig, Christian 写道: >>>>> Am 23.11.18 um 15:10 schrieb Zhou, David(ChunMing): >>>>>> 在 2018/11/23 21:30, Koenig, Christian 写道: >>>>>>> Am 23.11.18 um 14:27 schrieb Zhou, David(ChunMing): >>>>>>>> 在 2018/11/22 19:25, Christian König 写道: >>>>>>>>> Am 22.11.18 um 07:56 schrieb Sharma, Deepak: >>>>>>>>>> when returned fence is not valid mostly due to userspace ignored >>>>>>>>>> previous error causes NULL pointer dereference. >>>>>>>>> Again, this is clearly incorrect. The my other mails on the >>>>>>>>> earlier patch. >>>>>>>> Sorry for I didn't get your history, but looks from the patch >>>>>>>> itself, it is still a valid patch, isn't it? >>>>>>> No, the semantic of amdgpu_ctx_get_fence() is that we return NULL >>>>>>> when the fence is already signaled. >>>>>>> >>>>>>> So this patch could totally break userspace because it changes the >>>>>>> behavior when we try to sync to an already signaled fence. >>>>>> Ah, I got your meaning, how about attached patch? >>>>> Yeah something like this, but I would just give the >>>>> DRM_SYNCOBJ_CREATE_SIGNALED instead. >>>>> >>>>> I mean that's what this flag is good for isn't it? >>>> Yeah, I give a flag initally when creating patch, but as you know, there is a >>> swich case not be able to use that flag: >>>> case AMDGPU_FENCE_TO_HANDLE_GET_SYNC_FILE_FD: >>>> fd = get_unused_fd_flags(O_CLOEXEC); >>>> if (fd < 0) { >>>> dma_fence_put(fence); >>>> return fd; >>>> } >>>> >>>> sync_file = sync_file_create(fence); >>>> dma_fence_put(fence); >>>> if (!sync_file) { >>>> put_unused_fd(fd); >>>> return -ENOMEM; >>>> } >>>> >>>> fd_install(fd, sync_file->file); >>>> info->out.handle = fd; >>>> return 0; >>>> >>>> So I change to stub fence instead. >>> Yeah, I've missed that case. Not sure if the sync_file can deal with a NULL >>> fence. >>> >>> We should then probably move the stub fence function into >>> dma_fence_stub.c under drivers/dma-buf to keep the stuff together. >> Yes, you wrap it to review first with your stub fence, we can do it separately first. >> >> -David >>>> -David >>>> >>>> I have not applied this patch. >>>> The issue was trying to address is when amdgpu_cs_ioctl() failed due to >>> low memory (ENOMEM) but userspace chose to proceed and called >>> amdgpu_cs_fence_to_handle_ioctl(). >>>> In amdgpu_cs_fence_to_handle_ioctl(), fence is null and later causing >>>> NULL pointer dereference, this patch was to avoid that and system panic >>> But I understand now that its a valid case retuning NULL if fence was already >>> signaled but need to handle case so avoid kernel panic. Seems David patch >>> should fix this, I will test it tomorrow. >>> >>> Mhm, but don't we bail out with an error if we ask for a failed command >>> submission? If not that sounds like a bug as well. >>> >>> Christian. >>> > Where do we do that? > I see error > [drm:amdgpu_cs_ioctl] *ERROR* amdgpu_cs_list_validate(validated) failed. > [drm:amdgpu_cs_ioctl] *ERROR* Not enough memory for command submission! > BUG: unable to handle kernel NULL pointer dereference at 0000000000000008 > Did some more debugging,dma_fence_is_array() is causing NULL pointer > dereference call through sync_file_ioctl. > > Also I think changes in David patch cant be applied on > amd-staging-drm-next, which all patches I should take to get it correctly? > Mhm, what you say here actually doesn't make much sense. Yes I did mixed different issue here at end , please ignore it. When the sequence number is invalid because the submission failed amdgpu_ctx_get_fence() returns an error: > if (seq >= centity->sequence) { > spin_unlock(&ctx->ring_lock); > return ERR_PTR(-EINVAL); > } This error is then handled in amdgpu_cs_fence_to_handle_ioctl(): > fence = amdgpu_cs_get_fence(adev, filp, &info->in.fence); > if (IS_ERR(fence)) > return PTR_ERR(fence); So the error handling for this is actually in place. The only thing that could go wrong is that userspace sends a sequence number which is to small. The correct solution is to either set the flag as I suggested and make sync_file_create() capable of handling a NULL fence. Or we make the stub fence global like David suggested. Regards, Christian. Thanks Christian for explanation , understood. We should take any of suggest approach in for solving issue. Will give some more try to reproduce this and try the fix. -Deepak > >>>> -Deepak >>>>> Christian. >>>>> >>>>>> -David >>>>>>> If that patch was applied then please revert it immediately. >>>>>>> >>>>>>> Christian. >>>>>>> >>>>>>>> -David >>>>>>>>> If you have already pushed the patch then please revert. >>>>>>>>> >>>>>>>>> Christian. >>>>>>>>> >>>>>>>>>> Signed-off-by: Deepak Sharma <Deepak.Sharma@xxxxxxx> >>>>>>>>>> --- >>>>>>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 2 ++ >>>>>>>>>> 1 file changed, 2 insertions(+) >>>>>>>>>> >>>>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c >>>>>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c >>>>>>>>>> index 024dfbd87f11..14166cd8a12f 100644 >>>>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c >>>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c >>>>>>>>>> @@ -1403,6 +1403,8 @@ static struct dma_fence >>>>>>>>>> *amdgpu_cs_get_fence(struct amdgpu_device *adev, >>>>>>>>>> fence = amdgpu_ctx_get_fence(ctx, entity, user->seq_no); >>>>>>>>>> amdgpu_ctx_put(ctx); >>>>>>>>>> + if(!fence) >>>>>>>>>> + return ERR_PTR(-EINVAL); >>>>>>>>>> return fence; >>>>>>>>>> } >>>> _______________________________________________ >>>> amd-gfx mailing list >>>> amd-gfx@xxxxxxxxxxxxxxxxxxxxx >>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx > _______________________________________________ > amd-gfx mailing list > amd-gfx@xxxxxxxxxxxxxxxxxxxxx > https://lists.freedesktop.org/mailman/listinfo/amd-gfx _______________________________________________ amd-gfx mailing list amd-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/amd-gfx _______________________________________________ amd-gfx mailing list amd-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/amd-gfx