[AMD Official Use Only - Internal Distribution Only] Hi Alex, When we will cherry pick those patches to drm-next? >-----Original Message----- >From: Grodzovsky, Andrey <Andrey.Grodzovsky@xxxxxxx> >Sent: Tuesday, December 3, 2019 11:10 AM >To: Deng, Emily <Emily.Deng@xxxxxxx>; Deucher, Alexander ><Alexander.Deucher@xxxxxxx> >Cc: dri-devel@xxxxxxxxxxxxxxxxxxxxx; amd-gfx@xxxxxxxxxxxxxxxxxxxxx; Koenig, >Christian <Christian.Koenig@xxxxxxx>; steven.price@xxxxxxx >Subject: Re: [PATCH v4] drm/scheduler: Avoid accessing freed bad job. > >Yes - Christian just pushed it to drm-next-misc - I guess Alex/Christian didn't pull >to amd-staging-drm-next yet. > >Andrey > >On 12/2/19 2:24 PM, Deng, Emily wrote: >> [AMD Official Use Only - Internal Distribution Only] >> >> Hi Andrey, >> Seems this patch is still not in amd-staging-drm-next? >> >> Best wishes >> Emily Deng >> >> >> >>> -----Original Message----- >>> From: Deng, Emily >>> Sent: Tuesday, November 26, 2019 4:41 PM >>> To: Grodzovsky, Andrey <Andrey.Grodzovsky@xxxxxxx> >>> Cc: dri-devel@xxxxxxxxxxxxxxxxxxxxx; amd-gfx@xxxxxxxxxxxxxxxxxxxxx; >>> Koenig, Christian <Christian.Koenig@xxxxxxx>; steven.price@xxxxxxx >>> Subject: RE: [PATCH v4] drm/scheduler: Avoid accessing freed bad job. >>> >>> [AMD Official Use Only - Internal Distribution Only] >>> >>> Reviewed-by: Emily Deng <Emily.Deng@xxxxxxx> >>> >>>> -----Original Message----- >>>> From: Grodzovsky, Andrey <Andrey.Grodzovsky@xxxxxxx> >>>> Sent: Tuesday, November 26, 2019 7:37 AM >>>> Cc: dri-devel@xxxxxxxxxxxxxxxxxxxxx; amd-gfx@xxxxxxxxxxxxxxxxxxxxx; >>>> Koenig, Christian <Christian.Koenig@xxxxxxx>; Deng, Emily >>>> <Emily.Deng@xxxxxxx>; steven.price@xxxxxxx >>>> Subject: Re: [PATCH v4] drm/scheduler: Avoid accessing freed bad job. >>>> >>>> Ping >>>> >>>> Andrey >>>> >>>> On 11/25/19 3:51 PM, Andrey Grodzovsky wrote: >>>>> Problem: >>>>> Due to a race between drm_sched_cleanup_jobs in sched thread and >>>>> drm_sched_job_timedout in timeout work there is a possiblity that >>>>> bad job was already freed while still being accessed from the >>>>> timeout thread. >>>>> >>>>> Fix: >>>>> Instead of just peeking at the bad job in the mirror list remove it >>>>> from the list under lock and then put it back later when we are >>>>> garanteed no race with main sched thread is possible which is after >>>>> the thread is parked. >>>>> >>>>> v2: Lock around processing ring_mirror_list in drm_sched_cleanup_jobs. >>>>> >>>>> v3: Rebase on top of drm-misc-next. v2 is not needed anymore as >>>>> drm_sched_get_cleanup_job already has a lock there. >>>>> >>>>> v4: Fix comments to relfect latest code in drm-misc. >>>>> >>>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@xxxxxxx> >>>>> Reviewed-by: Christian König <christian.koenig@xxxxxxx> >>>>> Tested-by: Emily Deng <Emily.Deng@xxxxxxx> >>>>> --- >>>>> drivers/gpu/drm/scheduler/sched_main.c | 27 >>>> +++++++++++++++++++++++++++ >>>>> 1 file changed, 27 insertions(+) >>>>> >>>>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c >>>>> b/drivers/gpu/drm/scheduler/sched_main.c >>>>> index 6774955..1bf9c40 100644 >>>>> --- a/drivers/gpu/drm/scheduler/sched_main.c >>>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c >>>>> @@ -284,10 +284,21 @@ static void drm_sched_job_timedout(struct >>>> work_struct *work) >>>>> unsigned long flags; >>>>> >>>>> sched = container_of(work, struct drm_gpu_scheduler, >>>>> work_tdr.work); >>>>> + >>>>> + /* Protects against concurrent deletion in >>>> drm_sched_get_cleanup_job */ >>>>> + spin_lock_irqsave(&sched->job_list_lock, flags); >>>>> job = list_first_entry_or_null(&sched->ring_mirror_list, >>>>> struct drm_sched_job, node); >>>>> >>>>> if (job) { >>>>> + /* >>>>> + * Remove the bad job so it cannot be freed by concurrent >>>>> + * drm_sched_cleanup_jobs. It will be reinserted back after >>>> sched->thread >>>>> + * is parked at which point it's safe. >>>>> + */ >>>>> + list_del_init(&job->node); >>>>> + spin_unlock_irqrestore(&sched->job_list_lock, flags); >>>>> + >>>>> job->sched->ops->timedout_job(job); >>>>> >>>>> /* >>>>> @@ -298,6 +309,8 @@ static void drm_sched_job_timedout(struct >>>> work_struct *work) >>>>> job->sched->ops->free_job(job); >>>>> sched->free_guilty = false; >>>>> } >>>>> + } else { >>>>> + spin_unlock_irqrestore(&sched->job_list_lock, flags); >>>>> } >>>>> >>>>> spin_lock_irqsave(&sched->job_list_lock, flags); @@ -370,6 >>>>> +383,20 @@ void drm_sched_stop(struct drm_gpu_scheduler *sched, >>>>> struct >>>> drm_sched_job *bad) >>>>> kthread_park(sched->thread); >>>>> >>>>> /* >>>>> + * Reinsert back the bad job here - now it's safe as >>>>> + * drm_sched_get_cleanup_job cannot race against us and release the >>>>> + * bad job at this point - we parked (waited for) any in progress >>>>> + * (earlier) cleanups and drm_sched_get_cleanup_job will not be >>>> called >>>>> + * now until the scheduler thread is unparked. >>>>> + */ >>>>> + if (bad && bad->sched == sched) >>>>> + /* >>>>> + * Add at the head of the queue to reflect it was the earliest >>>>> + * job extracted. >>>>> + */ >>>>> + list_add(&bad->node, &sched->ring_mirror_list); >>>>> + >>>>> + /* >>>>> * Iterate the job list from later to earlier one and either deactive >>>>> * their HW callbacks or remove them from mirror list if they already >>>>> * signaled. _______________________________________________ amd-gfx mailing list amd-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/amd-gfx