Re: [PATCH v4 12/13] drm/msm: Utilize gpu scheduler priorities

Rob Clark <robdclark@xxxxxxxxx> · Wed, 25 May 2022 06:41:13 -0700

On Wed, May 25, 2022 at 2:46 AM Tvrtko Ursulin
<tvrtko.ursulin@xxxxxxxxxxxxxxx> wrote:
>
>
> On 24/05/2022 15:50, Rob Clark wrote:
> > On Tue, May 24, 2022 at 6:45 AM Tvrtko Ursulin
> > <tvrtko.ursulin@xxxxxxxxxxxxxxx> wrote:
> >>
> >>
> >> On 23/05/2022 23:53, Rob Clark wrote:
> >>> On Mon, May 23, 2022 at 7:45 AM Tvrtko Ursulin
> >>> <tvrtko.ursulin@xxxxxxxxxxxxxxx> wrote:
> >>>>
> >>>>
> >>>> Hi Rob,
> >>>>
> >>>> On 28/07/2021 02:06, Rob Clark wrote:
> >>>>> From: Rob Clark <robdclark@xxxxxxxxxxxx>
> >>>>>
> >>>>> The drm/scheduler provides additional prioritization on top of that
> >>>>> provided by however many number of ringbuffers (each with their own
> >>>>> priority level) is supported on a given generation.  Expose the
> >>>>> additional levels of priority to userspace and map the userspace
> >>>>> priority back to ring (first level of priority) and schedular priority
> >>>>> (additional priority levels within the ring).
> >>>>>
> >>>>> Signed-off-by: Rob Clark <robdclark@xxxxxxxxxxxx>
> >>>>> Acked-by: Christian König <christian.koenig@xxxxxxx>
> >>>>> ---
> >>>>>     drivers/gpu/drm/msm/adreno/adreno_gpu.c |  4 +-
> >>>>>     drivers/gpu/drm/msm/msm_gem_submit.c    |  4 +-
> >>>>>     drivers/gpu/drm/msm/msm_gpu.h           | 58 ++++++++++++++++++++++++-
> >>>>>     drivers/gpu/drm/msm/msm_submitqueue.c   | 35 +++++++--------
> >>>>>     include/uapi/drm/msm_drm.h              | 14 +++++-
> >>>>>     5 files changed, 88 insertions(+), 27 deletions(-)
> >>>>>
> >>>>> diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
> >>>>> index bad4809b68ef..748665232d29 100644
> >>>>> --- a/drivers/gpu/drm/msm/adreno/adreno_gpu.c
> >>>>> +++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
> >>>>> @@ -261,8 +261,8 @@ int adreno_get_param(struct msm_gpu *gpu, uint32_t param, uint64_t *value)
> >>>>>                         return ret;
> >>>>>                 }
> >>>>>                 return -EINVAL;
> >>>>> -     case MSM_PARAM_NR_RINGS:
> >>>>> -             *value = gpu->nr_rings;
> >>>>> +     case MSM_PARAM_PRIORITIES:
> >>>>> +             *value = gpu->nr_rings * NR_SCHED_PRIORITIES;
> >>>>>                 return 0;
> >>>>>         case MSM_PARAM_PP_PGTABLE:
> >>>>>                 *value = 0;
> >>>>> diff --git a/drivers/gpu/drm/msm/msm_gem_submit.c b/drivers/gpu/drm/msm/msm_gem_submit.c
> >>>>> index 450efe59abb5..c2ecec5b11c4 100644
> >>>>> --- a/drivers/gpu/drm/msm/msm_gem_submit.c
> >>>>> +++ b/drivers/gpu/drm/msm/msm_gem_submit.c
> >>>>> @@ -59,7 +59,7 @@ static struct msm_gem_submit *submit_create(struct drm_device *dev,
> >>>>>         submit->gpu = gpu;
> >>>>>         submit->cmd = (void *)&submit->bos[nr_bos];
> >>>>>         submit->queue = queue;
> >>>>> -     submit->ring = gpu->rb[queue->prio];
> >>>>> +     submit->ring = gpu->rb[queue->ring_nr];
> >>>>>         submit->fault_dumped = false;
> >>>>>
> >>>>>         INIT_LIST_HEAD(&submit->node);
> >>>>> @@ -749,7 +749,7 @@ int msm_ioctl_gem_submit(struct drm_device *dev, void *data,
> >>>>>         /* Get a unique identifier for the submission for logging purposes */
> >>>>>         submitid = atomic_inc_return(&ident) - 1;
> >>>>>
> >>>>> -     ring = gpu->rb[queue->prio];
> >>>>> +     ring = gpu->rb[queue->ring_nr];
> >>>>>         trace_msm_gpu_submit(pid_nr(pid), ring->id, submitid,
> >>>>>                 args->nr_bos, args->nr_cmds);
> >>>>>
> >>>>> diff --git a/drivers/gpu/drm/msm/msm_gpu.h b/drivers/gpu/drm/msm/msm_gpu.h
> >>>>> index b912cacaecc0..0e4b45bff2e6 100644
> >>>>> --- a/drivers/gpu/drm/msm/msm_gpu.h
> >>>>> +++ b/drivers/gpu/drm/msm/msm_gpu.h
> >>>>> @@ -250,6 +250,59 @@ struct msm_gpu_perfcntr {
> >>>>>         const char *name;
> >>>>>     };
> >>>>>
> >>>>> +/*
> >>>>> + * The number of priority levels provided by drm gpu scheduler.  The
> >>>>> + * DRM_SCHED_PRIORITY_KERNEL priority level is treated specially in some
> >>>>> + * cases, so we don't use it (no need for kernel generated jobs).
> >>>>> + */
> >>>>> +#define NR_SCHED_PRIORITIES (1 + DRM_SCHED_PRIORITY_HIGH - DRM_SCHED_PRIORITY_MIN)
> >>>>> +
> >>>>> +/**
> >>>>> + * msm_gpu_convert_priority - Map userspace priority to ring # and sched priority
> >>>>> + *
> >>>>> + * @gpu:        the gpu instance
> >>>>> + * @prio:       the userspace priority level
> >>>>> + * @ring_nr:    [out] the ringbuffer the userspace priority maps to
> >>>>> + * @sched_prio: [out] the gpu scheduler priority level which the userspace
> >>>>> + *              priority maps to
> >>>>> + *
> >>>>> + * With drm/scheduler providing it's own level of prioritization, our total
> >>>>> + * number of available priority levels is (nr_rings * NR_SCHED_PRIORITIES).
> >>>>> + * Each ring is associated with it's own scheduler instance.  However, our
> >>>>> + * UABI is that lower numerical values are higher priority.  So mapping the
> >>>>> + * single userspace priority level into ring_nr and sched_prio takes some
> >>>>> + * care.  The userspace provided priority (when a submitqueue is created)
> >>>>> + * is mapped to ring nr and scheduler priority as such:
> >>>>> + *
> >>>>> + *   ring_nr    = userspace_prio / NR_SCHED_PRIORITIES
> >>>>> + *   sched_prio = NR_SCHED_PRIORITIES -
> >>>>> + *                (userspace_prio % NR_SCHED_PRIORITIES) - 1
> >>>>> + *
> >>>>> + * This allows generations without preemption (nr_rings==1) to have some
> >>>>> + * amount of prioritization, and provides more priority levels for gens
> >>>>> + * that do have preemption.
> >>>>
> >>>> I am exploring how different drivers handle priority levels and this
> >>>> caught my eye.
> >>>>
> >>>> Is the implication of the last paragraphs that on hw with nr_rings > 1,
> >>>> ring + 1 preempts ring?
> >>>
> >>> Other way around, at least from the uabi standpoint.  Ie. ring[0]
> >>> preempts ring[1]
> >>
> >> Ah yes, I figure it out from the comments but then confused myself when
> >> writing the email.
> >>
> >>>> If so I am wondering does the "spreading" of
> >>>> user visible priorities by NR_SCHED_PRIORITIES creates a non-preemptable
> >>>> levels within every "bucket" or how does that work?
> >>>
> >>> So, preemption is possible between any priority level before run_job()
> >>> gets called, which writes the job into the ringbuffer.  After that
> >>
> >> Hmm how? Before run_job() the jobs are not runnable, sitting in the
> >> scheduler queues, right?
> >
> > I mean, if prio[0]+prio[1]+prio[2] map to a single ring, submit A on
> > prio[1] could be executed after submit B on prio[2] provided that
> > run_job(submitA) hasn't happened yet.  So I guess it isn't "really"
> > preemption because the submit hasn't started running on the GPU yet.
> > But rather just scheduling according to priority.
> >
> >>> point, you only have "bucket" level preemption, because
> >>> NR_SCHED_PRIORITIES levels of priority get mapped to a single FIFO
> >>> ringbuffer.
> >>
> >> Right, and you have one GPU with four rings, which means you expose 12
> >> priority levels to userspace, did I get that right?
> >
> > Correct
> >
> >> If so how do you convey in the ABI that not all there priority levels
> >> are equal? Like userspace can submit at prio 4 and expect prio 3 to
> >> preempt, as would prio 2 preempt prio 3. While actual behaviour will not
> >> match - 3 will not preempt 4.
> >
> > It isn't really exposed to userspace, but perhaps it should be..
> > Userspace just knows that, to the extent possible, the kernel will try
> > to execute prio 3 before prio 4.
> >
> >> Also, does your userspace stack (EGL/Vulkan) use the priorities? I had a
> >> quick peek in Mesa but did not spot it - although I am not really at
> >> home there yet so maybe I missed it.
> >
> > Yes, there is an EGL extension:
> >
> > https://www.khronos.org/registry/EGL/extensions/IMG/EGL_IMG_context_priority.txt
> >
> > It is pretty limited, it only exposes three priority levels.
>
> Right, is that wired up on msm? And if it is, or could be, how do/would
> you map the three priority levels for GPUs which expose 3 priority
> levels versus the one which exposes 12?

We don't yet, but probably should, expose a cap to indicate to
userspace the # of hw rings vs # of levels of sched priority

> Is it doable properly without leaking the fact drm/sched internal
> implementation detail of three priority levels? Or if you went the other
> way and only exposed up to max 3 levels, then you lose one priority
> level your hardware suppose which is also not good.
>
> It is all quite interesting because your hardware is completely
> different from ours in this respect. In our case i915 decides when to
> preempt, hardware has no concept of priority (*).

It is really pretty much all in firmware.. a6xx is the first gen that
could do actual (non-cooperative) preemption (but that isn't
implemented yet in upstream driver)

BR,
-R

> Regards,
>
> Tvrtko
>
> (*) Almost no concept of priority in hardware - we do have it on new
> GPUs and only on a subset of engine classes where render and compute
> share the EUs. But I think it's way different from Ardenos.