Hi Steve, There are two bailing out points in panfrost_job_hw_submit(): one is the error path beginning from pm_runtime_get_sync(), the other one is the error path beginning from WARN_ON() in the if statement. The pm imbalance fixed in this patch is between these two paths. I think the caller of panfrost_job_hw_submit() cannot distinguish this imbalance outside this function. panfrost_job_timedout() calls pm_runtime_put_noidle() for every job it finds, but all jobs are added to the pfdev->jobs just before calling panfrost_job_hw_submit(). Therefore I think the imbalance still exists. But I'm not very sure if we should add pm_runtime_put on the error path after pm_runtime_get_sync(), or remove pm_runtime_put one the error path after WARN_ON(). As for the problem about panfrost_devfreq_record_busy(), this may be a new bug and requires independent patch to fix it. Regards, Dinghao > On 20/05/2020 12:05, Dinghao Liu wrote: > > pm_runtime_get_sync() increments the runtime PM usage counter even > > the call returns an error code. Thus a pairing decrement is needed > > on the error handling path to keep the counter balanced. > > > > Signed-off-by: Dinghao Liu <dinghao.liu@xxxxxxxxxx> > > Actually I think we have the opposite problem. To be honest we don't > handle this situation very well. By the time panfrost_job_hw_submit() is > called the job has already been added to the pfdev->jobs array, so it's > considered submitted even if it never actually lands on the hardware. So > in the case of this function bailing out early we will then (eventually) > hit a timeout and trigger a GPU reset. > > panfrost_job_timedout() iterates through the pfdev->jobs array and calls > pm_runtime_put_noidle() for each job it finds. So there's no inbalance > here that I can see. > > Have you actually observed the situation where pm_runtime_get_sync() > returns a failure? > > HOWEVER, it appears that by bailing out early the call to > panfrost_devfreq_record_busy() is never made, which as far as I can see > means that there may be an extra call to panfrost_devfreq_record_idle() > when the jobs have timed out. Which could underflow the counter. > > But equally looking at panfrost_job_timedout(), we only call > panfrost_devfreq_record_idle() *once* even though multiple jobs might be > processed. > > There's a completely untested patch below which in theory should fix that... > > Steve > > ----8<--- > diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c > b/drivers/gpu/drm/panfrost/panfrost_job.c > index 7914b1570841..f9519afca29d 100644 > --- a/drivers/gpu/drm/panfrost/panfrost_job.c > +++ b/drivers/gpu/drm/panfrost/panfrost_job.c > @@ -145,6 +145,8 @@ static void panfrost_job_hw_submit(struct > panfrost_job *job, int js) > u64 jc_head = job->jc; > int ret; > > + panfrost_devfreq_record_busy(pfdev); > + > ret = pm_runtime_get_sync(pfdev->dev); > if (ret < 0) > return; > @@ -155,7 +157,6 @@ static void panfrost_job_hw_submit(struct > panfrost_job *job, int js) > } > > cfg = panfrost_mmu_as_get(pfdev, &job->file_priv->mmu); > - panfrost_devfreq_record_busy(pfdev); > > job_write(pfdev, JS_HEAD_NEXT_LO(js), jc_head & 0xFFFFFFFF); > job_write(pfdev, JS_HEAD_NEXT_HI(js), jc_head >> 32); > @@ -410,12 +411,12 @@ static void panfrost_job_timedout(struct > drm_sched_job *sched_job) > for (i = 0; i < NUM_JOB_SLOTS; i++) { > if (pfdev->jobs[i]) { > pm_runtime_put_noidle(pfdev->dev); > + panfrost_devfreq_record_idle(pfdev); > pfdev->jobs[i] = NULL; > } > } > spin_unlock_irqrestore(&pfdev->js->job_lock, flags); > > - panfrost_devfreq_record_idle(pfdev); > panfrost_device_reset(pfdev); > > for (i = 0; i < NUM_JOB_SLOTS; i++) _______________________________________________ dri-devel mailing list dri-devel@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/dri-devel