Am 08.11.2017 um 07:39 schrieb Monk Liu: > if app close CTX right after IB submit, gpu recover > will failed to find out the entity/ctx behind the guilty > job thus lead to bad job skipping in scheduler failed > > to fix this corner case just move the job->karma increasing > out of the condition that the backing entity was found > that way the job itself will be "guilty" anyway > > Change-Id: Ia30f02df9297a343d6d8dace496e237827dd1548 > Signed-off-by: Monk Liu <Monk.Liu at amd.com> Reviewed-by: Christian König <christian.koenig at amd.com> > --- > drivers/gpu/drm/amd/scheduler/gpu_scheduler.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/amd/scheduler/gpu_scheduler.c b/drivers/gpu/drm/amd/scheduler/gpu_scheduler.c > index 7aa6455..720fd1b 100644 > --- a/drivers/gpu/drm/amd/scheduler/gpu_scheduler.c > +++ b/drivers/gpu/drm/amd/scheduler/gpu_scheduler.c > @@ -464,6 +464,7 @@ void amd_sched_hw_job_reset(struct amd_gpu_scheduler *sched, struct amd_sched_jo > spin_unlock(&sched->job_list_lock); > > if (bad) { > + atomic_inc(&bad->karma); > /* don't increase @bad's karma if it's from KERNEL RQ, > * becuase sometimes GPU hang would cause kernel jobs (like VM updating jobs) > * corrupt but keep in mind that kernel jobs always considered good. > @@ -474,7 +475,7 @@ void amd_sched_hw_job_reset(struct amd_gpu_scheduler *sched, struct amd_sched_jo > spin_lock(&rq->lock); > list_for_each_entry_safe(entity, tmp, &rq->entities, list) { > if (bad->s_fence->scheduled.context == entity->fence_context) { > - if (atomic_inc_return(&bad->karma) > bad->sched->hang_limit) > + if (atomic_read(&bad->karma) > bad->sched->hang_limit) > if (entity->guilty) > atomic_set(entity->guilty, 1); > break;