Re: [PATCH 4/7] drm/msm/A6xx: Implement preemption for A7XX targets

Akhil P Oommen <quic_akhilpo@xxxxxxxxxxx> · Thu, 29 Aug 2024 00:53:04 +0530

On Wed, Aug 28, 2024 at 06:46:37AM -0700, Rob Clark wrote:
> On Wed, Aug 28, 2024 at 6:42 AM Rob Clark <robdclark@xxxxxxxxx> wrote:
> >
> > On Tue, Aug 27, 2024 at 3:56 PM Antonino Maniscalco
> > <antomani103@xxxxxxxxx> wrote:
> > >
> > > On 8/27/24 11:07 PM, Rob Clark wrote:
> > > > On Tue, Aug 27, 2024 at 1:25 PM Antonino Maniscalco
> > > > <antomani103@xxxxxxxxx> wrote:
> > > >>
> > > >> On 8/27/24 9:48 PM, Akhil P Oommen wrote:
> > > >>> On Fri, Aug 23, 2024 at 10:23:48AM +0100, Connor Abbott wrote:
> > > >>>> On Fri, Aug 23, 2024 at 10:21 AM Connor Abbott <cwabbott0@xxxxxxxxx> wrote:
> > > >>>>>
> > > >>>>> On Thu, Aug 22, 2024 at 9:06 PM Akhil P Oommen <quic_akhilpo@xxxxxxxxxxx> wrote:
> > > >>>>>>
> > > >>>>>> On Wed, Aug 21, 2024 at 05:02:56PM +0100, Connor Abbott wrote:
> > > >>>>>>> On Mon, Aug 19, 2024 at 9:09 PM Akhil P Oommen <quic_akhilpo@xxxxxxxxxxx> wrote:
> > > >>>>>>>>
> > > >>>>>>>> On Thu, Aug 15, 2024 at 08:26:14PM +0200, Antonino Maniscalco wrote:
> > > >>>>>>>>> This patch implements preemption feature for A6xx targets, this allows
> > > >>>>>>>>> the GPU to switch to a higher priority ringbuffer if one is ready. A6XX
> > > >>>>>>>>> hardware as such supports multiple levels of preemption granularities,
> > > >>>>>>>>> ranging from coarse grained(ringbuffer level) to a more fine grained
> > > >>>>>>>>> such as draw-call level or a bin boundary level preemption. This patch
> > > >>>>>>>>> enables the basic preemption level, with more fine grained preemption
> > > >>>>>>>>> support to follow.
> > > >>>>>>>>>
> > > >>>>>>>>> Signed-off-by: Sharat Masetty <smasetty@xxxxxxxxxxxxxx>
> > > >>>>>>>>> Signed-off-by: Antonino Maniscalco <antomani103@xxxxxxxxx>
> > > >>>>>>>>> ---
> > > >>>>>>>>
> > > >>>>>>>> No postamble packets which resets perfcounters? It is necessary. Also, I
> > > >>>>>>>> think we should disable preemption during profiling like we disable slumber.
> > > >>>>>>>>
> > > >>>>>>>> -Akhil.
> > > >>>>>>>
> > > >>>>>>> I don't see anything in kgsl which disables preemption during
> > > >>>>>>> profiling. It disables resetting perfcounters when doing system-wide
> > > >>>>>>> profiling, like freedreno, and in that case I assume preempting is
> > > >>>>>>> fine because the system profiler has a complete view of everything and
> > > >>>>>>> should "see" preemptions through the traces. For something like
> > > >>>>>>> VK_KHR_performance_query I suppose we'd want to disable preemption
> > > >>>>>>> because we disable saving/restoring perf counters, but that has to
> > > >>>>>>> happen in userspace because the kernel doesn't know what userspace
> > > >>>>>>> does.
> > > >>>>>>>
> > > >>>>>>
> > > >>>>>> KGSL does some sort of arbitration of perfcounter configurations and
> > > >>>>>> adds the select/enablement reg configuration as part of dynamic
> > > >>>>>> power up register list which we are not doing here. Is this something
> > > >>>>>> you are taking care of from userspace via preamble?
> > > >>>>>>
> > > >>>>>> -Akhil
> > > >>>>>
> > > >>>>> I don't think we have to take care of that in userspace, because Mesa
> > > >>>>> will always configure the counter registers before reading them in the
> > > >>>>> same submission, and if it gets preempted in the meantime then we're
> > > >>>>> toast anyways (due to not saving/restoring perf counters). kgsl sets
> > > >>>>> them from userspace, which is why it has to do something to set them
> > > >>>>
> > > >>>> Sorry, should be "kgsl sets them from the kernel".
> > > >>>>
> > > >>>>> after IFPC slumber or a context switch when the HW state is gone.
> > > >>>>> Also, because the upstream approach doesn't play nicely with system
> > > >>>>> profilers like perfetto, VK_KHR_performance_query is hidden by default
> > > >>>>> behind a debug flag in turnip. So there's already an element of "this
> > > >>>>> is unsupported, you have to know what you're doing to use it."
> > > >>>
> > > >>> But when you have composition on GPU enabled, there will be very frequent
> > > >>> preemption. And I don't know how usable profiling tools will be in that
> > > >>> case unless you disable preemption with a Mesa debug flag. But for that
> > > >>> to work, all existing submitqueues should be destroyed and recreated.
> > > >>>
> > > >>> So I was thinking that we can use the sysprof propertry to force L0
> > > >>> preemption from kernel.
> > > >>>
> > > >>> -Akhil.
> > > >>>
> > > >>
> > > >> Right but when using a system profiler I imagined the expectation would
> > > >> be to be able to understand how applications and compositor interact. An
> > > >> use case could be measuring latency and understanding what contributes
> > > >> to it. That is actually the main reason I added traces for preemption.
> > > >> Disabling preemption would make it less useful for this type of
> > > >> analysis. Did you have an use case in mind for a system profiler that
> > > >> would benefit from disabling preemption and that is not covered by
> > > >> VK_KHR_performance_query (or equivalent GL ext)?

Please consider this as a friendly suggestion based on Conner's clarification.
Not a blocker. TBH, I don't have clairty on the profiling story in Mesa!

> > > >
> > > > I would think that we want to generate an event, with GPU timestamp
> > > > (ie. RB_DONE) and which ring we are switching to, so that perfetto/etc
> > > > could display multiple GPU timelines and where the switch from one to
> > > > the other happens.
> > > >
> > > > I'm a bit curious how this is handled on android, with AGI/etc.. I
> > > > don't see any support in perfetto for this.
> > > >
> > > > BR,
> > > > -R
> > > >
> > > >> Best regards,
> > > >> --
> > > >> Antonino Maniscalco <antomani103@xxxxxxxxx>
> > > >>
> > >
> > > Looking at KGSL they seem to use ftrace and I don't see it doing
> > > anything to get a timestamp from some GPU timer, really not sure how
> > > that would be put in a gpu timeline.

Yeah, we usually rely on ftraces which is good enough to measure preemption
latency.

-Akhil.

> >
> > I suspect it would require some work on perfetto trace-processor.  It
> > can ingest ftrace events (but those would end up being something
> > driver specific).  Maybe with u_trace and some tracepoints in the
> > 'ambles something could be done that would be more driver agnostic
> > (but idk if that would work for gpu's where preemption happens more
> > autonomously in the fw)
> 
> btw how to handle tracing preemption probably shouldn't hold up
> sending the next iteration of this series.  There isn't that much more
> time to get this in v6.12, and I think better visualization of
> preemption is going to take some work outside of the kernel.
> 
> BR,
> -R