[PATCH 5/5] drm/amd/sched: signal and free remaining fences in amd_sched_entity_fini

christian.koenig@xxxxxxx (Christian König) · Mon, 9 Oct 2017 14:33:49 +0200

Am 09.10.2017 um 13:27 schrieb Nicolai HÃ¤hnle:
> On 09.10.2017 13:12, Christian KÃ¶nig wrote:
>>>
>>>> Nicolai, how hard would it be to handle ENODEV as failure for all 
>>>> currently existing contexts?
>>>
>>> Impossible? "All currently existing contexts" is not a well-defined 
>>> concept when multiple drivers co-exist in the same process.
>>
>> Ok, let me refine the question: I assume there are resources "shared" 
>> between contexts like binary shader code for example which needs to 
>> be reuploaded when VRAM is lost.
>>
>> How hard would it be to handle that correctly?
>
> Okay, that makes more sense :)
>
> With the current interface it's still pretty difficult, but if we 
> could get a new per-device query ioctl which returns a "VRAM loss 
> counter", it would be reasonably straight-forward.

The problem with the VRAM lost counter is that this isn't save either. 
E.g. you could have an application which uploads shaders, a GPU reset 
happens and VRAM is lost and then the application creates a new context 
and makes submission with broken shader binaries.

So I would still vote for a separate IOCTL to reset the VRAM lost state 
which is called *before" user space starts to reupload 
shader/descriptors etc...

This way you also catch the case when another reset happens while you 
reupload things.

>
>>> And what would be the purpose of this? If it's to support VRAM loss, 
>>> having a per-context VRAM loss counter would enable each context to 
>>> signal ECANCELED separately.
>>
>> I thought of that on top of the -ENODEV handling.
>>
>> In other words when we see -ENODEV we call an IOCTL to let the kernel 
>> know we noticed that something is wrong and then reinit all shared 
>> resources in userspace.
>>
>> All existing context will still see -ECANCELED when we drop their 
>> command submission, but new contexts would at least not cause a new 
>> lockup immediately because their shader binaries are corrupted.
>
> I don't think we need -ENODEV for this. We just need -ECANCELED to be 
> returned when a submission is rejected due to reset (hang or VRAM loss).
>
> Mesa would keep a shadow of the VRAM loss counter in pipe_screen and 
> pipe_context, and query the kernel's counter when it encounters 
> -ECANCELED. Each context would then know to drop the CS it's built up 
> so far and restart based on comparing the VRAM loss counter of 
> pipe_screen to that of pipe_context, and similarly we could keep a 
> copy of the VRAM loss counter for important buffer objects like shader 
> binaries, descriptors, etc.
>
> This seems more robust to me than relying only on an ENODEV. We'd most 
> likely keep some kind of VRAM loss counter in Mesa *anyway* (we don't 
> maintain a list of all shaders, for example, and we can't nuke 
> important per-context across threads), and synthesizing such a counter 
> from ENODEVs is not particularly robust (what if multiple ENODEVs 
> occur for the same loss event?).
>
> BTW, I still don't like ENODEV. It seems more like the kind of error 
> code you'd return with hot-pluggable GPUs where the device can 
> physically disappear...

Yeah, ECANCELED sounds like a better alternative. But I think we should 
still somehow note the fatality of loosing VRAM to userspace.

How about ENODATA or EBADFD?

Regards,
Christian.

>
> Cheers,
> Nicolai
>
>
>>
>> Regards,
>> Christian.
>>
>> Am 09.10.2017 um 13:04 schrieb Nicolai HÃ¤hnle:
>>> On 09.10.2017 12:59, Christian KÃ¶nig wrote:
>>>> Nicolai, how hard would it be to handle ENODEV as failure for all 
>>>> currently existing contexts?
>>>
>>> Impossible? "All currently existing contexts" is not a well-defined 
>>> concept when multiple drivers co-exist in the same process.
>>>
>>> And what would be the purpose of this? If it's to support VRAM loss, 
>>> having a per-context VRAM loss counter would enable each context to 
>>> signal ECANCELED separately.
>>>
>>> Cheers,
>>> Nicolai
>>>
>>>
>>>>
>>>> Monk, would it be ok with you when we return ENODEV only for 
>>>> existing context when VRAM is lost and/or we have a strict mode GPU 
>>>> reset? E.g. newly created contexts would continue work as they should.
>>>>
>>>> Regards,
>>>> Christian.
>>>>
>>>> Am 09.10.2017 um 12:49 schrieb Nicolai HÃ¤hnle:
>>>>> Hi Monk,
>>>>>
>>>>> Yes, you're right, we're only using ECANCELED internally. But as a 
>>>>> consequence, Mesa would already handle a kernel error of ECANCELED 
>>>>> on context loss correctly :)
>>>>>
>>>>> Cheers,
>>>>> Nicolai
>>>>>
>>>>> On 09.10.2017 12:35, Liu, Monk wrote:
>>>>>> Hi Christian
>>>>>>
>>>>>> You reject some of my patches that returns -ENODEV, with the 
>>>>>> cause that MESA doesn't do the handling on -ENODEV
>>>>>>
>>>>>> But if Nicolai can confirm that MESA do have a handling on 
>>>>>> -ECANCELED, then we need to overall align our error code, on 
>>>>>> detail below IOCTL can return error code:
>>>>>>
>>>>>> Amdgpu_cs_ioctl
>>>>>> Amdgpu_cs_wait_ioctl
>>>>>> Amdgpu_cs_wait_fences_ioctl
>>>>>> Amdgpu_info_ioctl
>>>>>>
>>>>>>
>>>>>> My patches is:
>>>>>> return -ENODEV on cs_ioctl if the context is detected guilty,
>>>>>> also return -ENODEV on cs_wait|cs_wait_fences if the fence is 
>>>>>> signaled but with error -ETIME,
>>>>>> also return -ENODEV on info_ioctl so UMD can query if gpu reset 
>>>>>> happened after the process created (because for strict mode we 
>>>>>> block process instead of context)
>>>>>>
>>>>>>
>>>>>> according to Nicolai:
>>>>>>
>>>>>> amdgpu_cs_ioctl *can* return -ECANCELED, but to be frankly 
>>>>>> speaking, kernel part doesn't have any place with "-ECANCELED" so 
>>>>>> this solution on MESA side doesn't align with *current* amdgpu 
>>>>>> driver,
>>>>>> which only return 0 on success or -EINVALID on other error but 
>>>>>> definitely no "-ECANCELED" error code,
>>>>>>
>>>>>> so if we talking about community rules we shouldn't let MESA 
>>>>>> handle -ECANCELED ,  we should have a unified error code
>>>>>>
>>>>>> + Marek
>>>>>>
>>>>>> BR Monk
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Haehnle, Nicolai
>>>>>> Sent: 2017å¹´10æ??9æ?¥ 18:14
>>>>>> To: Koenig, Christian <Christian.Koenig at amd.com>; Liu, Monk 
>>>>>> <Monk.Liu at amd.com>; Nicolai HÃ¤hnle <nhaehnle at gmail.com>; 
>>>>>> amd-gfx at lists.freedesktop.org
>>>>>> Subject: Re: [PATCH 5/5] drm/amd/sched: signal and free remaining 
>>>>>> fences in amd_sched_entity_fini
>>>>>>
>>>>>> On 09.10.2017 10:02, Christian KÃ¶nig wrote:
>>>>>>>> For gpu reset patches (already submitted to pub) I would make 
>>>>>>>> kernel
>>>>>>>> return -ENODEV if the waiting fence (in cs_wait or wait_fences 
>>>>>>>> IOCTL)
>>>>>>>> founded as error, that way UMD would run into robust extension 
>>>>>>>> path
>>>>>>>> and considering the GPU hang occurred,
>>>>>>> Well that is only closed source behavior which is completely
>>>>>>> irrelevant for upstream development.
>>>>>>>
>>>>>>> As far as I know we haven't pushed the change to return -ENODEV 
>>>>>>> upstream.
>>>>>>
>>>>>> FWIW, radeonsi currently expects -ECANCELED on CS submissions and 
>>>>>> treats those as context lost. Perhaps we could use the same error 
>>>>>> on fences?
>>>>>> That makes more sense to me than -ENODEV.
>>>>>>
>>>>>> Cheers,
>>>>>> Nicolai
>>>>>>
>>>>>>>
>>>>>>> Regards,
>>>>>>> Christian.
>>>>>>>
>>>>>>> Am 09.10.2017 um 08:42 schrieb Liu, Monk:
>>>>>>>> Christian
>>>>>>>>
>>>>>>>>> It would be really nice to have an error code set on
>>>>>>>>> s_fence->finished before it is signaled, use 
>>>>>>>>> dma_fence_set_error()
>>>>>>>>> for this.
>>>>>>>> For gpu reset patches (already submitted to pub) I would make 
>>>>>>>> kernel
>>>>>>>> return -ENODEV if the waiting fence (in cs_wait or wait_fences 
>>>>>>>> IOCTL)
>>>>>>>> founded as error, that way UMD would run into robust extension 
>>>>>>>> path
>>>>>>>> and considering the GPU hang occurred,
>>>>>>>>
>>>>>>>> Don't know if this is expected for the case of normal process 
>>>>>>>> being
>>>>>>>> killed or crashed like Nicolai hit ... since there is no gpu 
>>>>>>>> hang hit
>>>>>>>>
>>>>>>>>
>>>>>>>> BR Monk
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> -----Original Message-----
>>>>>>>> From: amd-gfx [mailto:amd-gfx-bounces at lists.freedesktop.org] On
>>>>>>>> Behalf Of Christian K?nig
>>>>>>>> Sent: 2017å¹´9æ??28æ?¥ 23:01
>>>>>>>> To: Nicolai HÃ¤hnle <nhaehnle at gmail.com>;
>>>>>>>> amd-gfx at lists.freedesktop.org
>>>>>>>> Cc: Haehnle, Nicolai <Nicolai.Haehnle at amd.com>
>>>>>>>> Subject: Re: [PATCH 5/5] drm/amd/sched: signal and free remaining
>>>>>>>> fences in amd_sched_entity_fini
>>>>>>>>
>>>>>>>> Am 28.09.2017 um 16:55 schrieb Nicolai HÃ¤hnle:
>>>>>>>>> From: Nicolai HÃ¤hnle <nicolai.haehnle at amd.com>
>>>>>>>>>
>>>>>>>>> Highly concurrent Piglit runs can trigger a race condition 
>>>>>>>>> where a
>>>>>>>>> pending SDMA job on a buffer object is never executed because the
>>>>>>>>> corresponding process is killed (perhaps due to a crash). 
>>>>>>>>> Since the
>>>>>>>>> job's fences were never signaled, the buffer object was 
>>>>>>>>> effectively
>>>>>>>>> leaked. Worse, the buffer was stuck wherever it happened to be at
>>>>>>>>> the time, possibly in VRAM.
>>>>>>>>>
>>>>>>>>> The symptom was user space processes stuck in interruptible waits
>>>>>>>>> with kernel stacks like:
>>>>>>>>>
>>>>>>>>>        [<ffffffffbc5e6722>] dma_fence_default_wait+0x112/0x250
>>>>>>>>>        [<ffffffffbc5e6399>] dma_fence_wait_timeout+0x39/0xf0
>>>>>>>>>        [<ffffffffbc5e82d2>]
>>>>>>>>> reservation_object_wait_timeout_rcu+0x1c2/0x300
>>>>>>>>>        [<ffffffffc03ce56f>] 
>>>>>>>>> ttm_bo_cleanup_refs_and_unlock+0xff/0x1a0
>>>>>>>>> [ttm]
>>>>>>>>>        [<ffffffffc03cf1ea>] ttm_mem_evict_first+0xba/0x1a0 [ttm]
>>>>>>>>>        [<ffffffffc03cf611>] ttm_bo_mem_space+0x341/0x4c0 [ttm]
>>>>>>>>>        [<ffffffffc03cfc54>] ttm_bo_validate+0xd4/0x150 [ttm]
>>>>>>>>>        [<ffffffffc03cffbd>] ttm_bo_init_reserved+0x2ed/0x420 
>>>>>>>>> [ttm]
>>>>>>>>>        [<ffffffffc042f523>] 
>>>>>>>>> amdgpu_bo_create_restricted+0x1f3/0x470
>>>>>>>>> [amdgpu]
>>>>>>>>>        [<ffffffffc042f9fa>] amdgpu_bo_create+0xda/0x220 [amdgpu]
>>>>>>>>>        [<ffffffffc04349ea>] amdgpu_gem_object_create+0xaa/0x140
>>>>>>>>> [amdgpu]
>>>>>>>>>        [<ffffffffc0434f97>] amdgpu_gem_create_ioctl+0x97/0x120
>>>>>>>>> [amdgpu]
>>>>>>>>>        [<ffffffffc037ddba>] drm_ioctl+0x1fa/0x480 [drm]
>>>>>>>>>        [<ffffffffc041904f>] amdgpu_drm_ioctl+0x4f/0x90 [amdgpu]
>>>>>>>>>        [<ffffffffbc23db33>] do_vfs_ioctl+0xa3/0x5f0
>>>>>>>>>        [<ffffffffbc23e0f9>] SyS_ioctl+0x79/0x90
>>>>>>>>>        [<ffffffffbc864ffb>] entry_SYSCALL_64_fastpath+0x1e/0xad
>>>>>>>>>        [<ffffffffffffffff>] 0xffffffffffffffff
>>>>>>>>>
>>>>>>>>> Signed-off-by: Nicolai HÃ¤hnle <nicolai.haehnle at amd.com>
>>>>>>>>> Acked-by: Christian KÃ¶nig <christian.koenig at amd.com>
>>>>>>>>> ---
>>>>>>>>>     drivers/gpu/drm/amd/scheduler/gpu_scheduler.c | 7 ++++++-
>>>>>>>>>     1 file changed, 6 insertions(+), 1 deletion(-)
>>>>>>>>>
>>>>>>>>> diff --git a/drivers/gpu/drm/amd/scheduler/gpu_scheduler.c
>>>>>>>>> b/drivers/gpu/drm/amd/scheduler/gpu_scheduler.c
>>>>>>>>> index 54eb77cffd9b..32a99e980d78 100644
>>>>>>>>> --- a/drivers/gpu/drm/amd/scheduler/gpu_scheduler.c
>>>>>>>>> +++ b/drivers/gpu/drm/amd/scheduler/gpu_scheduler.c
>>>>>>>>> @@ -220,22 +220,27 @@ void amd_sched_entity_fini(struct
>>>>>>>>> amd_gpu_scheduler *sched,
>>>>>>>>> amd_sched_entity_is_idle(entity));
>>>>>>>>>         amd_sched_rq_remove_entity(rq, entity);
>>>>>>>>>         if (r) {
>>>>>>>>>             struct amd_sched_job *job;
>>>>>>>>>             /* Park the kernel for a moment to make sure it isn't
>>>>>>>>> processing
>>>>>>>>>              * our enity.
>>>>>>>>>              */
>>>>>>>>>             kthread_park(sched->thread);
>>>>>>>>>             kthread_unpark(sched->thread);
>>>>>>>>> -        while (kfifo_out(&entity->job_queue, &job, sizeof(job)))
>>>>>>>>> +        while (kfifo_out(&entity->job_queue, &job, 
>>>>>>>>> sizeof(job))) {
>>>>>>>>> +            struct amd_sched_fence *s_fence = job->s_fence;
>>>>>>>>> +            amd_sched_fence_scheduled(s_fence);
>>>>>>>> It would be really nice to have an error code set on
>>>>>>>> s_fence->finished before it is signaled, use 
>>>>>>>> dma_fence_set_error() for this.
>>>>>>>>
>>>>>>>> Additional to that it would be nice to note in the subject line 
>>>>>>>> that
>>>>>>>> this is a rather important bug fix.
>>>>>>>>
>>>>>>>> With that fixed the whole series is Reviewed-by: Christian KÃ¶nig
>>>>>>>> <christian.koenig at amd.com>.
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Christian.
>>>>>>>>
>>>>>>>>> + amd_sched_fence_finished(s_fence);
>>>>>>>>> + dma_fence_put(&s_fence->finished);
>>>>>>>>>                 sched->ops->free_job(job);
>>>>>>>>> +        }
>>>>>>>>>         }
>>>>>>>>>         kfifo_free(&entity->job_queue);
>>>>>>>>>     }
>>>>>>>>>     static void amd_sched_entity_wakeup(struct dma_fence *f, 
>>>>>>>>> struct
>>>>>>>>> dma_fence_cb *cb)
>>>>>>>>>     {
>>>>>>>>>         struct amd_sched_entity *entity =
>>>>>>>>>             container_of(cb, struct amd_sched_entity, cb);
>>>>>>>>>         entity->dependency = NULL;
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> amd-gfx mailing list
>>>>>>>> amd-gfx at lists.freedesktop.org
>>>>>>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>>>>>>>> _______________________________________________
>>>>>>>> amd-gfx mailing list
>>>>>>>> amd-gfx at lists.freedesktop.org
>>>>>>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>