Sorry, just get what you mean, will submit a v2 patch. On Wed Aug 18, 2021 at 04:08:37PM +0800, Jingwen Chen wrote: > On Tue Aug 17, 2021 at 03:43:58PM +0200, Christian König wrote: > > > > > > Am 17.08.21 um 15:37 schrieb Andrey Grodzovsky: > > > On 2021-08-17 12:28 a.m., Jingwen Chen wrote: > > > > [Why] > > > > for bailing job, this commit will delete it from pending list thus the > > > > bailing job will never have a chance to be resubmitted even in advance > > > > tdr mode. > > > > > > > > [How] > > > > after embeded hw_fence into amdgpu_job is done, the race condition that > > > > this commit tries to work around is completely solved.So revert this > > > > commit. > > > > This reverts commit 135517d3565b48f4def3b1b82008bc17eb5d1c90. > > > > > > > > > Can you elaborate please how this solves the race ? > > > As far as I see and with this patch reverted, in drm_sched_job_timedout > > > you get a pointer to next job to process in timed out handler, > > > immediately > > > next this job is actually finished and it's fence signaled, this in turn > > > triggers > > > drm_sched_get_cleanup_job which fetches this job and returns to > Hi Andrey, > > if drm_sched_job_timedout is triggered first, drm_sched_get_cleanup_job will return > NULL when the timeout worker is running according to this code: > if ((sched->timeout != MAX_SCHEDULE_TIMEOUT && > !cancel_delayed_work(&sched->work_tdr)) || > kthread_should_park()) > return NULL; > > But yes a dma_fence_get(job->s_fence->parent) is needed before > processing timedout_job. When the bad job is signaled just before > processing, the amdgpu_fence_free can be triggered by the dma_fence_put() of HW fence. > And if I'm understanding this race condition correctly, the spin_lock is > still needed here to avoid the drm_sched_get_cleanup_job get the > spin_lock first and then enter the tdr work. > > > drm_sched_main > > > which in turn call free_job callabck->...->amdgpu_fence_free which frees > > > the job > > > from the HW dma_fence release callback. After that you proceed with a > > > freed job > > > in timed out handler. > > > > > > If you could take the HW fence reference from drm_sched_job_timedout > > > before > > > starting processing then yes, I think it would work. > > > > Yes, precisely that's what I had in mind as well and seems to be missing > > from this patch. > > > > Regards, > > Christian. > > > > > > > > Andrey > > > > > > > > > > > > > > Signed-off-by: Jingwen Chen <Jingwen.Chen2@xxxxxxx> > > > > --- > > > > drivers/gpu/drm/scheduler/sched_main.c | 27 -------------------------- > > > > 1 file changed, 27 deletions(-) > > > > > > > > diff --git a/drivers/gpu/drm/scheduler/sched_main.c > > > > b/drivers/gpu/drm/scheduler/sched_main.c > > > > index a2a953693b45..31d1176d939f 100644 > > > > --- a/drivers/gpu/drm/scheduler/sched_main.c > > > > +++ b/drivers/gpu/drm/scheduler/sched_main.c > > > > @@ -317,21 +317,10 @@ static void drm_sched_job_timedout(struct > > > > work_struct *work) > > > > enum drm_gpu_sched_stat status = DRM_GPU_SCHED_STAT_NOMINAL; > > > > sched = container_of(work, struct drm_gpu_scheduler, > > > > work_tdr.work); > > > > - > > > > - /* Protects against concurrent deletion in > > > > drm_sched_get_cleanup_job */ > > > > - spin_lock(&sched->job_list_lock); > > > > job = list_first_entry_or_null(&sched->pending_list, > > > > struct drm_sched_job, list); > > > > if (job) { > > > > - /* > > > > - * Remove the bad job so it cannot be freed by concurrent > > > > - * drm_sched_cleanup_jobs. It will be reinserted back after > > > > sched->thread > > > > - * is parked at which point it's safe. > > > > - */ > > > > - list_del_init(&job->list); > > > > - spin_unlock(&sched->job_list_lock); > > > > - > > > > status = job->sched->ops->timedout_job(job); > > > > /* > > > > @@ -342,8 +331,6 @@ static void drm_sched_job_timedout(struct > > > > work_struct *work) > > > > job->sched->ops->free_job(job); > > > > sched->free_guilty = false; > > > > } > > > > - } else { > > > > - spin_unlock(&sched->job_list_lock); > > > > } > > > > if (status != DRM_GPU_SCHED_STAT_ENODEV) { > > > > @@ -392,20 +379,6 @@ void drm_sched_stop(struct drm_gpu_scheduler > > > > *sched, struct drm_sched_job *bad) > > > > kthread_park(sched->thread); > > > > - /* > > > > - * Reinsert back the bad job here - now it's safe as > > > > - * drm_sched_get_cleanup_job cannot race against us and release the > > > > - * bad job at this point - we parked (waited for) any in progress > > > > - * (earlier) cleanups and drm_sched_get_cleanup_job will not be > > > > called > > > > - * now until the scheduler thread is unparked. > > > > - */ > > > > - if (bad && bad->sched == sched) > > > > - /* > > > > - * Add at the head of the queue to reflect it was the earliest > > > > - * job extracted. > > > > - */ > > > > - list_add(&bad->list, &sched->pending_list); > > > > - > > > > /* > > > > * Iterate the job list from later to earlier one and either > > > > deactive > > > > * their HW callbacks or remove them from pending list if they > > > > already > >