On 10/30/19 6:22 AM, S, Shirish wrote: > On 10/30/2019 3:50 PM, Koenig, Christian wrote: >> Am 30.10.19 um 10:13 schrieb S, Shirish: >>> [Why] >>> >>> doing kthread_park()/unpark() from drm_sched_entity_fini >>> while GPU reset is in progress defeats all the purpose of >>> drm_sched_stop->kthread_park. >>> If drm_sched_entity_fini->kthread_unpark() happens AFTER >>> drm_sched_stop->kthread_park nothing prevents from another >>> (third) thread to keep submitting job to HW which will be >>> picked up by the unparked scheduler thread and try to submit >>> to HW but fail because the HW ring is deactivated. >>> >>> [How] >>> grab the reset lock before calling drm_sched_entity_fini() >>> >>> Signed-off-by: Shirish S <shirish.s@xxxxxxx> >>> Suggested-by: Christian König <christian.koenig@xxxxxxx> >> Patch itself is Reviewed-by: Christian König <christian.koenig@xxxxxxx> >> >> Does that also fix the problems you have been seeing? > Yes Christian. > > Regards, > > Shirish S Missed that one, why don't we fix it within scheduler code - the race is within scheduler ? Andrey > >> Thanks, >> Christian. >> >>> --- >>> drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c | 5 ++++- >>> 1 file changed, 4 insertions(+), 1 deletion(-) >>> >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c >>> index 6614d8a..2cdaf3b 100644 >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c >>> @@ -604,8 +604,11 @@ void amdgpu_ctx_mgr_entity_fini(struct amdgpu_ctx_mgr *mgr) >>> continue; >>> } >>> >>> - for (i = 0; i < num_entities; i++) >>> + for (i = 0; i < num_entities; i++) { >>> + mutex_lock(&ctx->adev->lock_reset); >>> drm_sched_entity_fini(&ctx->entities[0][i].entity); >>> + mutex_unlock(&ctx->adev->lock_reset); >>> + } >>> } >>> } >>> _______________________________________________ amd-gfx mailing list amd-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/amd-gfx