On Fri, Aug 23, 2019 at 9:50 AM Steven Price <steven.price@xxxxxxx> wrote: > > On 23/08/2019 03:12, Rob Herring wrote: > > Doing a pm_runtime_put as soon as a job is submitted is wrong as it should > > not happen until the job completes. It works because we are relying on the > > autosuspend timeout to keep the h/w enabled. > > > > Cc: Tomeu Vizoso <tomeu.vizoso@xxxxxxxxxxxxx> > > Cc: Steven Price <steven.price@xxxxxxx> > > Cc: Alyssa Rosenzweig <alyssa.rosenzweig@xxxxxxxxxxxxx> > > Cc: David Airlie <airlied@xxxxxxxx> > > Cc: Daniel Vetter <daniel@xxxxxxxx> > > Signed-off-by: Rob Herring <robh@xxxxxxxxxx> > > Nice find, but I'm a bit worried about one thing. > > > --- > > v2: new patch > > > > drivers/gpu/drm/panfrost/panfrost_job.c | 16 +++++++++------- > > 1 file changed, 9 insertions(+), 7 deletions(-) > > > > diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c b/drivers/gpu/drm/panfrost/panfrost_job.c > > index 05c85f45a0de..80c9cab9a01b 100644 > > --- a/drivers/gpu/drm/panfrost/panfrost_job.c > > +++ b/drivers/gpu/drm/panfrost/panfrost_job.c > > @@ -151,7 +151,7 @@ static void panfrost_job_hw_submit(struct panfrost_job *job, int js) > > return; > > > > if (WARN_ON(job_read(pfdev, JS_COMMAND_NEXT(js)))) > > - goto end; > > + return; > > > > cfg = panfrost_mmu_as_get(pfdev, &job->file_priv->mmu); > > > > @@ -187,10 +187,6 @@ static void panfrost_job_hw_submit(struct panfrost_job *job, int js) > > job_write(pfdev, JS_COMMAND_NEXT(js), JS_COMMAND_START); > > > > spin_unlock_irqrestore(&pfdev->hwaccess_lock, flags); > > - > > -end: > > - pm_runtime_mark_last_busy(pfdev->dev); > > - pm_runtime_put_autosuspend(pfdev->dev); > > } > > > > static void panfrost_acquire_object_fences(struct drm_gem_object **bos, > > @@ -388,9 +384,13 @@ static void panfrost_job_timedout(struct drm_sched_job *sched_job) > > > > mutex_lock(&pfdev->reset_lock); > > > > - for (i = 0; i < NUM_JOB_SLOTS; i++) > > + for (i = 0; i < NUM_JOB_SLOTS; i++) { > > drm_sched_stop(&pfdev->js->queue[i].sched, sched_job); > > - > > + if (pfdev->jobs[i]) { > > + pm_runtime_put_noidle(pfdev->dev); > > + pfdev->jobs[i] = NULL; > > I can't see what prevents this racing with panfrost_job_irq_handler() - > the job could be completing at the same time as we assign NULL. Then > panfrost_job_irq_handler() will happily dereference the NULL pointer... The fact that 1 job's timeout handler is cleaning up all the other jobs is messy. I'm not sure if there's a better way... We could perhaps disable the job interrupt at the beginning though I think we can still have a race as we can't use disable_irq() with a shared irq. Or do this after the reset. > Admittedly this patch is an improvement over the situation before :) Yes, and I'd like to stop digging a deeper hole... Rob _______________________________________________ dri-devel mailing list dri-devel@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/dri-devel