Re: [PATCH] drm/amdgpu: dont schedule jobs while in reset

"Koenig, Christian" <Christian.Koenig@xxxxxxx> · Wed, 30 Oct 2019 10:20:13 +0000

Am 30.10.19 um 10:13 schrieb S, Shirish:
> [Why]
>
> doing kthread_park()/unpark() from drm_sched_entity_fini
> while GPU reset is in progress defeats all the purpose of
> drm_sched_stop->kthread_park.
> If drm_sched_entity_fini->kthread_unpark() happens AFTER
> drm_sched_stop->kthread_park nothing prevents from another
> (third) thread to keep submitting job to HW which will be
> picked up by the unparked scheduler thread and try to submit
> to HW but fail because the HW ring is deactivated.
>
> [How]
> grab the reset lock before calling drm_sched_entity_fini()
>
> Signed-off-by: Shirish S <shirish.s@xxxxxxx>
> Suggested-by: Christian König <christian.koenig@xxxxxxx>

Patch itself is Reviewed-by: Christian König <christian.koenig@xxxxxxx>

Does that also fix the problems you have been seeing?

Thanks,
Christian.

> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c | 5 ++++-
>   1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
> index 6614d8a..2cdaf3b 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
> @@ -604,8 +604,11 @@ void amdgpu_ctx_mgr_entity_fini(struct amdgpu_ctx_mgr *mgr)
>   			continue;
>   		}
>   
> -		for (i = 0; i < num_entities; i++)
> +		for (i = 0; i < num_entities; i++) {
> +			mutex_lock(&ctx->adev->lock_reset);
>   			drm_sched_entity_fini(&ctx->entities[0][i].entity);
> +			mutex_unlock(&ctx->adev->lock_reset);
> +		}
>   	}
>   }
>   

_______________________________________________
amd-gfx mailing list
amd-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/amd-gfx