Re: [PATCH v2 2/2] drm/panfrost: Queue jobs on the hardware

Boris Brezillon <boris.brezillon@xxxxxxxxxxxxx> · Tue, 22 Jun 2021 08:50:52 +0200

On Mon, 21 Jun 2021 17:08:21 +0100
Steven Price <steven.price@xxxxxxx> wrote:

> On 21/06/2021 15:02, Boris Brezillon wrote:
> > From: Steven Price <steven.price@xxxxxxx>
> > 
> > The hardware has a set of '_NEXT' registers that can hold a second job
> > while the first is executing. Make use of these registers to enqueue a
> > second job per slot.
> > 
> > v2:
> > * Make sure non-faulty jobs get properly paused/resumed on GPU reset
> > 
> > Signed-off-by: Steven Price <steven.price@xxxxxxx>
> > Signed-off-by: Boris Brezillon <boris.brezillon@xxxxxxxxxxxxx>
> > ---
> >  drivers/gpu/drm/panfrost/panfrost_device.h |   2 +-
> >  drivers/gpu/drm/panfrost/panfrost_job.c    | 311 ++++++++++++++++-----
> >  2 files changed, 242 insertions(+), 71 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/panfrost/panfrost_device.h b/drivers/gpu/drm/panfrost/panfrost_device.h
> > index 95e6044008d2..a87917b9e714 100644
> > --- a/drivers/gpu/drm/panfrost/panfrost_device.h
> > +++ b/drivers/gpu/drm/panfrost/panfrost_device.h
> > @@ -101,7 +101,7 @@ struct panfrost_device {
> >  
> >  	struct panfrost_job_slot *js;
> >  
> > -	struct panfrost_job *jobs[NUM_JOB_SLOTS];
> > +	struct panfrost_job *jobs[NUM_JOB_SLOTS][2];
> >  	struct list_head scheduled_jobs;
> >  
> >  	struct panfrost_perfcnt *perfcnt;
> > diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c b/drivers/gpu/drm/panfrost/panfrost_job.c
> > index 1b5c636794a1..888eceed227f 100644
> > --- a/drivers/gpu/drm/panfrost/panfrost_job.c
> > +++ b/drivers/gpu/drm/panfrost/panfrost_job.c
> > @@ -4,6 +4,7 @@
> >  #include <linux/delay.h>
> >  #include <linux/interrupt.h>
> >  #include <linux/io.h>
> > +#include <linux/iopoll.h>
> >  #include <linux/platform_device.h>
> >  #include <linux/pm_runtime.h>
> >  #include <linux/dma-resv.h>
> > @@ -41,6 +42,7 @@ struct panfrost_queue_state {
> >  };
> >  
> >  struct panfrost_job_slot {
> > +	int irq;
> >  	struct panfrost_queue_state queue[NUM_JOB_SLOTS];
> >  	spinlock_t job_lock;
> >  };
> > @@ -148,9 +150,43 @@ static void panfrost_job_write_affinity(struct panfrost_device *pfdev,
> >  	job_write(pfdev, JS_AFFINITY_NEXT_HI(js), affinity >> 32);
> >  }
> >  
> > +static struct panfrost_job *
> > +panfrost_dequeue_job(struct panfrost_device *pfdev, int slot)
> > +{
> > +	struct panfrost_job *job = pfdev->jobs[slot][0];
> > +
> > +	pfdev->jobs[slot][0] = pfdev->jobs[slot][1];
> > +	pfdev->jobs[slot][1] = NULL;
> > +
> > +	return job;
> > +}
> > +
> > +static unsigned int
> > +panfrost_enqueue_job(struct panfrost_device *pfdev, int slot,
> > +		     struct panfrost_job *job)
> > +{
> > +	if (!pfdev->jobs[slot][0]) {
> > +		pfdev->jobs[slot][0] = job;
> > +		return 0;
> > +	}
> > +
> > +	WARN_ON(pfdev->jobs[slot][1]);
> > +	pfdev->jobs[slot][1] = job;
> > +	return 1;
> > +}
> > +
> > +static u32
> > +panfrost_get_job_chain_flag(const struct panfrost_job *job)
> > +{
> > +	struct panfrost_fence *f = to_panfrost_fence(job->done_fence);
> > +
> > +	return (f->seqno & 1) ? JS_CONFIG_JOB_CHAIN_FLAG : 0;  
> 
> Is the seqno going to reliably toggle like this? We need to ensure that
> when there are two jobs on the hardware they have different "job chain
> disambiguation" flags.

f->seqno is assigned the queue->emit_seqno which increases
monotonically at submission time. Since nothing can fail after the
fence creation in the submission path, 2 consecutive jobs on a given
queue should have different (f->seqno & 1) values.

> 
> Also that feature was only introduced in t76x. So relying on that would
> sadly kill off support for t60x, t62x and t72x (albeit I'm not sure how
> 'supported' these are with Mesa anyway).
> 
> It is possible to implement without the disambiguation flag - but it's
> a bit fiddly: it requires clearing out the _NEXT register, checking that
> you actually cleared it successfully (i.e. the hardware didn't just
> start the job before you cleared it) and then doing the action if still
> necessary. And of course then recovering from having cleared out _NEXT.
> There's a reason for adding the feature! ;)

As mentioned in my previous reply, I think I'll just disable this
feature on t72x-.

> 
> I'll try to review the rest and give it a spin later - although it's of
> course it looks quite familiar ;)

Thank you for your valuable feedback.

Regards,

Boris