On 13.10.2017 10:46, Liu, Monk wrote: > Just revert Nicolaiâ??s patch,if other routine want to reference s_fence, > it should get the finished fence in the first place/time, > > For gpu_reset routine, it refers to s_fence only on those unfinished job > in sched_hw_job_reset, so totally safe to refer to s_fence pointer > > I wonder what issue Nicolai met with and submitted this patch The original motivation of my patch was to catch accidental use of job->s_fence after the fence was destroyed in amd_sched_process_job. Basically, prevent a dangling pointer. I still think that's a reasonable idea, though clearly my first attempt at it was just wrong. In Christian's v2 patch, it might make sense to add spin_unlock(&sched->job_list_lock); + dma_fence_put(&s_job->s_fence->finished); + s_job->s_fence = NULL; sched->ops->free_job(s_job); ... though I'm not 100% certain about how the fence lifetimes work. Cheers, Nicolai > > BR Monk > > *From:*amd-gfx [mailto:amd-gfx-bounces at lists.freedesktop.org] *On Behalf > Of *Liu, Monk > *Sent:* 2017å¹´10æ??13æ?¥16:40 > *To:* Koenig, Christian <Christian.Koenig at amd.com>; Nicolai Hähnle > <nhaehnle at gmail.com>; amd-gfx at lists.freedesktop.org > *Subject:* RE: regression with d6c650c0a8f6f671e49553725e1db541376d95f2 > > I doubt it would always work fineâ?¦ > > First, we have FENCE_TRACE reference s_fence->finished after > â??fence_signal(&fence->finished)â?? > > Second, we have trace_amd_sched_proess_job(s_fence) after > â??amd_sched_fence_finished()â??, > > If you put the finished before free_job() and by coincidence the > job_finish() get very soon executed youâ??ll have odds to hit wild pointer > on above two cases > > BR Monk > > *From:*Koenig, Christian > *Sent:* 2017å¹´10æ??13æ?¥16:17 > *To:* Liu, Monk <Monk.Liu at amd.com <mailto:Monk.Liu at amd.com>>; Nicolai > Hähnle <nhaehnle at gmail.com <mailto:nhaehnle at gmail.com>>; > amd-gfx at lists.freedesktop.org <mailto:amd-gfx at lists.freedesktop.org> > *Subject:* Re: regression with d6c650c0a8f6f671e49553725e1db541376d95f2 > > Yeah, that change is actually incorrect and should be reverted. > > What we really need to do is remove dropping sched_job->s_fence from > amd_sched_process_job() into amd_sched_job_finish() directly before the > call to free_job(). > > Regards, > Christian. > > Am 13.10.2017 um 09:24 schrieb Liu, Monk: > > commit d6c650c0a8f6f671e49553725e1db541376d95f2 > Author: Nicolai Hähnle <nicolai.haehnle at amd.com> > <mailto:nicolai.haehnle at amd.com> > @@ -611,6 +611,10 @@ static int amd_sched_main(void *param) > >                fence = sched->ops->run_job(sched_job); >                amd_sched_fence_scheduled(s_fence); > + > +              /* amd_sched_process_job drops the job's reference > of the fence. */ > +              sched_job->s_fence = NULL; > + >                if (fence) { >                        s_fence->parent = dma_fence_get(fence); >                        r = dma_fence_add_callback(fence, &s_fence->cb, > > Hi Nicolai > > with this patch, you will break "amdgpu_sched_hw_job_reset()"routine: > > void > > amd_sched_hw_job_reset(structamd_gpu_scheduler > > *sched) > > { > > structamd_sched_job > > *s_job; > > spin_lock(&sched->job_list_lock); > > list_for_each_entry_reverse(s_job, > > &sched->ring_mirror_list, node) { > > if(s_job->s_fence->parent > > && > > fence_remove_callback(s_job->s_fence->parent, > >                     &s_job->s_fence->cb)) > > { > > fence_put(s_job->s_fence->parent); > >             s_job->s_fence->parent > > = > > NULL; > > atomic_dec(&sched->hw_rq_count); > >         } > >     } > > spin_unlock(&sched->job_list_lock); > > } > > see that without sched_job->s_fence, you cannot remove the callback > from its hw fence, > > any idea?? > > BR Monk > -- Lerne, wie die Welt wirklich ist, Aber vergiss niemals, wie sie sein sollte.