Ping..... Best wishes Emily Deng >-----Original Message----- >From: amd-gfx <amd-gfx-bounces@xxxxxxxxxxxxxxxxxxxxx> On Behalf Of Deng, >Emily >Sent: Friday, November 8, 2019 10:56 AM >To: Koenig, Christian <Christian.Koenig@xxxxxxx>; amd- >gfx@xxxxxxxxxxxxxxxxxxxxx >Subject: RE: [PATCH] drm/amdgpu: Fix the null pointer issue for tdr > >>-----Original Message----- >>From: Christian König <ckoenig.leichtzumerken@xxxxxxxxx> >>Sent: Thursday, November 7, 2019 7:28 PM >>To: Deng, Emily <Emily.Deng@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx >>Subject: Re: [PATCH] drm/amdgpu: Fix the null pointer issue for tdr >> >>Am 07.11.19 um 11:25 schrieb Emily Deng: >>> When the job is already signaled, the s_fence is freed. Then it will >>> has null pointer in amdgpu_device_gpu_recover. >> >>NAK, the s_fence is only set to NULL when the job is destroyed. See >>drm_sched_job_cleanup(). >I know it is set to NULL in drm_sched_job_cleanup. But in one case, when it >enter into the amdgpu_device_gpu_recover, it already in >drm_sched_job_cleanup, and at this time, it will go to free job. But the >amdgpu_device_gpu_recover sometimes is faster. At that time, job is not >freed, but s_fence is already NULL. >> >>When you see a job without an s_fence then that means the problem is >>somewhere else. >> >>Regards, >>Christian. >> >>> >>> Signed-off-by: Emily Deng <Emily.Deng@xxxxxxx> >>> --- >>> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 +- >>> drivers/gpu/drm/scheduler/sched_main.c | 11 ++++++----- >>> 2 files changed, 7 insertions(+), 6 deletions(-) >>> >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>> index e6ce949..5a8f08e 100644 >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>> @@ -4075,7 +4075,7 @@ int amdgpu_device_gpu_recover(struct >>amdgpu_device *adev, >>> * >>> * job->base holds a reference to parent fence >>> */ >>> - if (job && job->base.s_fence->parent && >>> + if (job && job->base.s_fence && job->base.s_fence->parent && >>> dma_fence_is_signaled(job->base.s_fence->parent)) >>> job_signaled = true; >>> >>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c >>> b/drivers/gpu/drm/scheduler/sched_main.c >>> index 31809ca..56cc10e 100644 >>> --- a/drivers/gpu/drm/scheduler/sched_main.c >>> +++ b/drivers/gpu/drm/scheduler/sched_main.c >>> @@ -334,8 +334,8 @@ void drm_sched_increase_karma(struct >>drm_sched_job >>> *bad) >>> >>> spin_lock(&rq->lock); >>> list_for_each_entry_safe(entity, tmp, &rq->entities, >>list) { >>> - if (bad->s_fence->scheduled.context == >>> - entity->fence_context) { >>> + if (bad->s_fence && (bad->s_fence- >>>scheduled.context == >>> + entity->fence_context)) { >>> if (atomic_read(&bad->karma) > >>> bad->sched->hang_limit) >>> if (entity->guilty) >>> @@ -376,7 +376,7 @@ void drm_sched_stop(struct drm_gpu_scheduler >>*sched, struct drm_sched_job *bad) >>> * This iteration is thread safe as sched thread is stopped. >>> */ >>> list_for_each_entry_safe_reverse(s_job, tmp, &sched- >>>ring_mirror_list, node) { >>> - if (s_job->s_fence->parent && >>> + if (s_job->s_fence && s_job->s_fence->parent && >>> dma_fence_remove_callback(s_job->s_fence->parent, >>> &s_job->cb)) { >>> atomic_dec(&sched->hw_rq_count); @@ -395,7 >+395,8 @@ void >>> drm_sched_stop(struct drm_gpu_scheduler >>*sched, struct drm_sched_job *bad) >>> * >>> * Job is still alive so fence refcount at least 1 >>> */ >>> - dma_fence_wait(&s_job->s_fence->finished, false); >>> + if (s_job->s_fence) >>> + dma_fence_wait(&s_job->s_fence->finished, >>false); >>> >>> /* >>> * We must keep bad job alive for later use during @@ >>-438,7 >>> +439,7 @@ void drm_sched_start(struct drm_gpu_scheduler *sched, bool >>full_recovery) >>> * GPU recovers can't run in parallel. >>> */ >>> list_for_each_entry_safe(s_job, tmp, &sched->ring_mirror_list, >>> node) >>{ >>> - struct dma_fence *fence = s_job->s_fence->parent; >>> + struct dma_fence *fence = s_job->s_fence ? s_job->s_fence- >>>parent : >>> +NULL; >>> >>> atomic_inc(&sched->hw_rq_count); >>> > >_______________________________________________ >amd-gfx mailing list >amd-gfx@xxxxxxxxxxxxxxxxxxxxx >https://lists.freedesktop.org/mailman/listinfo/amd-gfx _______________________________________________ amd-gfx mailing list amd-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/amd-gfx