On Wed, Jan 10, 2024 at 2:50 AM Daniel Vetter <daniel@xxxxxxxx> wrote: > > On Tue, Jan 09, 2024 at 10:22:17AM -0800, Rob Clark wrote: > > From: Rob Clark <robdclark@xxxxxxxxxxxx> > > > > This reverts commit abe2023b4cea192ab266b351fd38dc9dbd846df0. > > > > Changing the locking order means that scheduler/msm_job_run() can race > > with the recovery kthread worker, with the result that the GPU gets an > > extra runpm get when we are trying to power it off. Leaving the GPU in > > an unrecovered state. > > The recovery kthread is supposed to stop all the relevant schedulers, > which should remove any possible race conditions. So unless there's more > going on, or you have your own recovery kthread (don't, reuse the one from > the scheduler with your own work items, that's why you can provide that) > this looks like an incomplete/incorrect explanation ... ? > > Slightly confused msm still uses it's own recovery, which pre-dates the scheduler conversion. At one point (a yr or two back?) I started looking at integrating recovery w/ scheduler.. at the time I think you talked me out of it, but I don't remember the reason BR, -R > -Sima > > > > > I'll need to come up with a different scheme for appeasing lockdep. > > > > Signed-off-by: Rob Clark <robdclark@xxxxxxxxxxxx> > > --- > > drivers/gpu/drm/msm/msm_gpu.c | 11 +++++------ > > drivers/gpu/drm/msm/msm_ringbuffer.c | 7 +++++-- > > 2 files changed, 10 insertions(+), 8 deletions(-) > > > > diff --git a/drivers/gpu/drm/msm/msm_gpu.c b/drivers/gpu/drm/msm/msm_gpu.c > > index 095390774f22..655002b21b0d 100644 > > --- a/drivers/gpu/drm/msm/msm_gpu.c > > +++ b/drivers/gpu/drm/msm/msm_gpu.c > > @@ -751,12 +751,14 @@ void msm_gpu_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit) > > struct msm_ringbuffer *ring = submit->ring; > > unsigned long flags; > > > > - pm_runtime_get_sync(&gpu->pdev->dev); > > + WARN_ON(!mutex_is_locked(&gpu->lock)); > > > > - mutex_lock(&gpu->lock); > > + pm_runtime_get_sync(&gpu->pdev->dev); > > > > msm_gpu_hw_init(gpu); > > > > + submit->seqno = submit->hw_fence->seqno; > > + > > update_sw_cntrs(gpu); > > > > /* > > @@ -781,11 +783,8 @@ void msm_gpu_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit) > > gpu->funcs->submit(gpu, submit); > > gpu->cur_ctx_seqno = submit->queue->ctx->seqno; > > > > - hangcheck_timer_reset(gpu); > > - > > - mutex_unlock(&gpu->lock); > > - > > pm_runtime_put(&gpu->pdev->dev); > > + hangcheck_timer_reset(gpu); > > } > > > > /* > > diff --git a/drivers/gpu/drm/msm/msm_ringbuffer.c b/drivers/gpu/drm/msm/msm_ringbuffer.c > > index e0ed27739449..548f5266a7d3 100644 > > --- a/drivers/gpu/drm/msm/msm_ringbuffer.c > > +++ b/drivers/gpu/drm/msm/msm_ringbuffer.c > > @@ -21,8 +21,6 @@ static struct dma_fence *msm_job_run(struct drm_sched_job *job) > > > > msm_fence_init(submit->hw_fence, fctx); > > > > - submit->seqno = submit->hw_fence->seqno; > > - > > mutex_lock(&priv->lru.lock); > > > > for (i = 0; i < submit->nr_bos; i++) { > > @@ -35,8 +33,13 @@ static struct dma_fence *msm_job_run(struct drm_sched_job *job) > > > > mutex_unlock(&priv->lru.lock); > > > > + /* TODO move submit path over to using a per-ring lock.. */ > > + mutex_lock(&gpu->lock); > > + > > msm_gpu_submit(gpu, submit); > > > > + mutex_unlock(&gpu->lock); > > + > > return dma_fence_get(submit->hw_fence); > > } > > > > -- > > 2.43.0 > > > > -- > Daniel Vetter > Software Engineer, Intel Corporation > http://blog.ffwll.ch