On 2024-09-13 19:28, Rob Clark wrote: > On Fri, Sep 13, 2024 at 10:03 AM Michel Dänzer > <michel.daenzer@xxxxxxxxxxx> wrote: >> >> On 2024-09-13 18:53, Rob Clark wrote: >>> From: Rob Clark <robdclark@xxxxxxxxxxxx> >>> >>> Fixes a race condition reported here: https://github.com/AsahiLinux/linux/issues/309#issuecomment-2238968609 >>> >>> The whole premise of lockless access to a single-producer-single- >>> consumer queue is that there is just a single producer and single >>> consumer. That means we can't call drm_sched_can_queue() (which is >>> about queueing more work to the hw, not to the spsc queue) from >>> anywhere other than the consumer (wq). >>> >>> This call in the producer is just an optimization to avoid scheduling >>> the consuming worker if it cannot yet queue more work to the hw. It >>> is safe to drop this optimization to avoid the race condition. >>> >>> Suggested-by: Asahi Lina <lina@xxxxxxxxxxxxx> >>> Fixes: a78422e9dff3 ("drm/sched: implement dynamic job-flow control") >>> Signed-off-by: Rob Clark <robdclark@xxxxxxxxxxxx> >>> --- >>> drivers/gpu/drm/scheduler/sched_main.c | 3 +-- >>> 1 file changed, 1 insertion(+), 2 deletions(-) >>> >>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c >>> index ab53ab486fe6..1af1dbe757d5 100644 >>> --- a/drivers/gpu/drm/scheduler/sched_main.c >>> +++ b/drivers/gpu/drm/scheduler/sched_main.c >>> @@ -1020,8 +1020,7 @@ EXPORT_SYMBOL(drm_sched_job_cleanup); >>> void drm_sched_wakeup(struct drm_gpu_scheduler *sched, >>> struct drm_sched_entity *entity) >>> { >>> - if (drm_sched_can_queue(sched, entity)) >>> - drm_sched_run_job_queue(sched); >>> + drm_sched_run_job_queue(sched); >>> } >>> >>> /** >> >> The entity parameter is unused now. > > Right.. and we probably should collapse drm_sched_wakeup() and > drm_sched_run_job_queue().. I looked into that as well, seems fine to leave them separate for now though. > But this fix needs to be cherry picked back to a bunch of release > branches, so I intentionally avoided refactoring as part of the fix. Fixing up the drm_sched_wakeup caller(s) when backporting doesn't seem like a big deal. -- Earthling Michel Dänzer \ GNOME / Xwayland / Mesa developer https://redhat.com \ Libre software enthusiast