On Wed, Aug 28, 2024 at 06:46:37AM -0700, Rob Clark wrote: > On Wed, Aug 28, 2024 at 6:42 AM Rob Clark <robdclark@xxxxxxxxx> wrote: > > > > On Tue, Aug 27, 2024 at 3:56 PM Antonino Maniscalco > > <antomani103@xxxxxxxxx> wrote: > > > > > > On 8/27/24 11:07 PM, Rob Clark wrote: > > > > On Tue, Aug 27, 2024 at 1:25 PM Antonino Maniscalco > > > > <antomani103@xxxxxxxxx> wrote: > > > >> > > > >> On 8/27/24 9:48 PM, Akhil P Oommen wrote: > > > >>> On Fri, Aug 23, 2024 at 10:23:48AM +0100, Connor Abbott wrote: > > > >>>> On Fri, Aug 23, 2024 at 10:21 AM Connor Abbott <cwabbott0@xxxxxxxxx> wrote: > > > >>>>> > > > >>>>> On Thu, Aug 22, 2024 at 9:06 PM Akhil P Oommen <quic_akhilpo@xxxxxxxxxxx> wrote: > > > >>>>>> > > > >>>>>> On Wed, Aug 21, 2024 at 05:02:56PM +0100, Connor Abbott wrote: > > > >>>>>>> On Mon, Aug 19, 2024 at 9:09 PM Akhil P Oommen <quic_akhilpo@xxxxxxxxxxx> wrote: > > > >>>>>>>> > > > >>>>>>>> On Thu, Aug 15, 2024 at 08:26:14PM +0200, Antonino Maniscalco wrote: > > > >>>>>>>>> This patch implements preemption feature for A6xx targets, this allows > > > >>>>>>>>> the GPU to switch to a higher priority ringbuffer if one is ready. A6XX > > > >>>>>>>>> hardware as such supports multiple levels of preemption granularities, > > > >>>>>>>>> ranging from coarse grained(ringbuffer level) to a more fine grained > > > >>>>>>>>> such as draw-call level or a bin boundary level preemption. This patch > > > >>>>>>>>> enables the basic preemption level, with more fine grained preemption > > > >>>>>>>>> support to follow. > > > >>>>>>>>> > > > >>>>>>>>> Signed-off-by: Sharat Masetty <smasetty@xxxxxxxxxxxxxx> > > > >>>>>>>>> Signed-off-by: Antonino Maniscalco <antomani103@xxxxxxxxx> > > > >>>>>>>>> --- > > > >>>>>>>> > > > >>>>>>>> No postamble packets which resets perfcounters? It is necessary. Also, I > > > >>>>>>>> think we should disable preemption during profiling like we disable slumber. > > > >>>>>>>> > > > >>>>>>>> -Akhil. > > > >>>>>>> > > > >>>>>>> I don't see anything in kgsl which disables preemption during > > > >>>>>>> profiling. It disables resetting perfcounters when doing system-wide > > > >>>>>>> profiling, like freedreno, and in that case I assume preempting is > > > >>>>>>> fine because the system profiler has a complete view of everything and > > > >>>>>>> should "see" preemptions through the traces. For something like > > > >>>>>>> VK_KHR_performance_query I suppose we'd want to disable preemption > > > >>>>>>> because we disable saving/restoring perf counters, but that has to > > > >>>>>>> happen in userspace because the kernel doesn't know what userspace > > > >>>>>>> does. > > > >>>>>>> > > > >>>>>> > > > >>>>>> KGSL does some sort of arbitration of perfcounter configurations and > > > >>>>>> adds the select/enablement reg configuration as part of dynamic > > > >>>>>> power up register list which we are not doing here. Is this something > > > >>>>>> you are taking care of from userspace via preamble? > > > >>>>>> > > > >>>>>> -Akhil > > > >>>>> > > > >>>>> I don't think we have to take care of that in userspace, because Mesa > > > >>>>> will always configure the counter registers before reading them in the > > > >>>>> same submission, and if it gets preempted in the meantime then we're > > > >>>>> toast anyways (due to not saving/restoring perf counters). kgsl sets > > > >>>>> them from userspace, which is why it has to do something to set them > > > >>>> > > > >>>> Sorry, should be "kgsl sets them from the kernel". > > > >>>> > > > >>>>> after IFPC slumber or a context switch when the HW state is gone. > > > >>>>> Also, because the upstream approach doesn't play nicely with system > > > >>>>> profilers like perfetto, VK_KHR_performance_query is hidden by default > > > >>>>> behind a debug flag in turnip. So there's already an element of "this > > > >>>>> is unsupported, you have to know what you're doing to use it." > > > >>> > > > >>> But when you have composition on GPU enabled, there will be very frequent > > > >>> preemption. And I don't know how usable profiling tools will be in that > > > >>> case unless you disable preemption with a Mesa debug flag. But for that > > > >>> to work, all existing submitqueues should be destroyed and recreated. > > > >>> > > > >>> So I was thinking that we can use the sysprof propertry to force L0 > > > >>> preemption from kernel. > > > >>> > > > >>> -Akhil. > > > >>> > > > >> > > > >> Right but when using a system profiler I imagined the expectation would > > > >> be to be able to understand how applications and compositor interact. An > > > >> use case could be measuring latency and understanding what contributes > > > >> to it. That is actually the main reason I added traces for preemption. > > > >> Disabling preemption would make it less useful for this type of > > > >> analysis. Did you have an use case in mind for a system profiler that > > > >> would benefit from disabling preemption and that is not covered by > > > >> VK_KHR_performance_query (or equivalent GL ext)? Please consider this as a friendly suggestion based on Conner's clarification. Not a blocker. TBH, I don't have clairty on the profiling story in Mesa! > > > > > > > > I would think that we want to generate an event, with GPU timestamp > > > > (ie. RB_DONE) and which ring we are switching to, so that perfetto/etc > > > > could display multiple GPU timelines and where the switch from one to > > > > the other happens. > > > > > > > > I'm a bit curious how this is handled on android, with AGI/etc.. I > > > > don't see any support in perfetto for this. > > > > > > > > BR, > > > > -R > > > > > > > >> Best regards, > > > >> -- > > > >> Antonino Maniscalco <antomani103@xxxxxxxxx> > > > >> > > > > > > Looking at KGSL they seem to use ftrace and I don't see it doing > > > anything to get a timestamp from some GPU timer, really not sure how > > > that would be put in a gpu timeline. Yeah, we usually rely on ftraces which is good enough to measure preemption latency. -Akhil. > > > > I suspect it would require some work on perfetto trace-processor. It > > can ingest ftrace events (but those would end up being something > > driver specific). Maybe with u_trace and some tracepoints in the > > 'ambles something could be done that would be more driver agnostic > > (but idk if that would work for gpu's where preemption happens more > > autonomously in the fw) > > btw how to handle tracing preemption probably shouldn't hold up > sending the next iteration of this series. There isn't that much more > time to get this in v6.12, and I think better visualization of > preemption is going to take some work outside of the kernel. > > BR, > -R