> On 21.02.2024 14:34, Tvrtko Ursulin wrote: > > On 21/02/2024 09:40, Adrián Larumbe wrote: > > Hi, > > > > I just wanted to make sure we're on the same page on this matter. So in > > Panfrost, and I guess in almost every other single driver out there, HW perf > > counters and their uapi interface are orthogonal to fdinfo's reporting on drm > > engine utilisation. > > > > At the moment it seems like HW perfcounters and the way they're exposed to UM > > are very idiosincratic and any attempt to unify their interface into a common > > set of ioctl's sounds like a gargantuan task I wouldn't like to be faced with. > > I share the same feeling on this sub-topic. > > > As for fdinfo, I guess there's more room for coming up with common helpers that > > could handle the toggling of HW support for drm engine calculations, but I'd at > > least have to see how things are being done in let's say, Freedreno or Intel. > > For Intel we don't need this ability, well at least for pre-GuC platforms. > Stat collection is super cheap and permanently enabled there. > > But let me copy Umesh because something at the back of my mind is telling me > that perhaps there was something expensive about collecting these stats with > the GuC backend? If so maybe a toggle would be beneficial there. > > > Right now there's a pressing need to get rid of the debugfs knob for fdinfo's > > drm engine profiling sources in Panfrost, after which I could perhaps draw up an > > RFC for how to generalise this onto other drivers. > > There is a knob currently meaning fdinfo does not work by default? If that is > so, I would have at least expected someone had submitted a patch for gputop to > handle this toggle. It being kind of a common reference implementation I don't > think it is great if it does not work out of the box. It does sound like I forgot to document this knob at the time I submited fdinfo support for Panforst. I'll make a point of mentioning it in a new patch where I drop debugfs support and enable toggling from sysfs instead. > The toggle as an idea sounds a bit annoying, but if there is no other > realistic way maybe it is not too bad. As long as it is documented in the > drm-usage-stats.rst, doesn't live in debugfs, and has some common plumbing > implemented both on the kernel side and for the aforementioned gputop / > igt_drm_fdinfo / igt_drm_clients. Where and how exactly TBD. As soon as the new patch is merged, I'll go and reflect the driver uAPI changes in all three of these. > Regards, > > Tvrtko > Cheers, Adrian > > On 16.02.2024 17:43, Tvrtko Ursulin wrote: > > > > > > On 16/02/2024 16:57, Daniel Vetter wrote: > > > > On Wed, Feb 14, 2024 at 01:52:05PM +0000, Steven Price wrote: > > > > > Hi Adrián, > > > > > > > > > > On 14/02/2024 12:14, Adrián Larumbe wrote: > > > > > > A driver user expressed interest in being able to access engine usage stats > > > > > > through fdinfo when debugfs is not built into their kernel. In the current > > > > > > implementation, this wasn't possible, because it was assumed even for > > > > > > inflight jobs enabling the cycle counter and timestamp registers would > > > > > > incur in additional power consumption, so both were kept disabled until > > > > > > toggled through debugfs. > > > > > > > > > > > > A second read of the TRM made me think otherwise, but this is something > > > > > > that would be best clarified by someone from ARM's side. > > > > > > > > > > I'm afraid I can't give a definitive answer. This will probably vary > > > > > depending on implementation. The command register enables/disables > > > > > "propagation" of the cycle/timestamp values. This propagation will cost > > > > > some power (gates are getting toggled) but whether that power is > > > > > completely in the noise of the GPU as a whole I can't say. > > > > > > > > > > The out-of-tree kbase driver only enables the counters for jobs > > > > > explicitly marked (BASE_JD_REQ_PERMON) or due to an explicit connection > > > > > from a profiler. > > > > > > > > > > I'd be happier moving the debugfs file to sysfs rather than assuming > > > > > that the power consumption is small enough for all platforms. > > > > > > > > > > Ideally we'd have some sort of kernel interface for a profiler to inform > > > > > the kernel what it is interested in, but I can't immediately see how to > > > > > make that useful across different drivers. kbase's profiling support is > > > > > great with our profiling tools, but there's a very strong connection > > > > > between the two. > > > > > > > > Yeah I'm not sure whether a magic (worse probably per-driver massively > > > > different) file in sysfs is needed to enable gpu perf monitoring stats in > > > > fdinfo. > > > > > > > > I get that we do have a bit a gap because the linux perf pmu stuff is > > > > global, and you want per-process, and there's kinda no per-process support > > > > for perf stats for devices. But that's probably the direction we want to > > > > go, not so much fdinfo. At least for hardware performance counters and > > > > things like that. > > > > > > > > Iirc the i915 pmu support had some integration for per-process support, > > > > you might want to chat with Tvrtko for kernel side and Lionel for more > > > > userspace side. At least if I'm not making a complete mess and my memory > > > > is vaguely related to reality. Adding them both. > > > > > > Yeah there are two separate things, i915 PMU and i915 Perf/OA. > > > > > > If my memory serves me right I indeed did have a per-process support for i915 > > > PMU implemented as an RFC (or at least a branch somewhere) some years back. > > > IIRC it only exposed the per engine GPU utilisation and did not find it very > > > useful versus the complexity. (I think it at least required maintaining a map > > > of drm clients per task.) > > > > > > Our more useful profiling is using a custom Perf/OA interface (Observation > > > Architecture) which is possibly similar to kbase mentioned above. Why it is a > > > custom interface is explained in a large comment on top of i915_perf.c. Not > > > sure if all of them still hold but on the overall perf does not sound like the > > > right fit for detailed GPU profiling. > > > > > > Also PMU drivers are very challenging to get the implementation right, since > > > locking model and atomicity requirements are quite demanding. > > > > > > From my point of view, at least it is my initial thinking, if custom per > > > driver solutions are strongly not desired, it could be interesting to look > > > into whether there is enough commonality, in at least concepts, to see if a > > > new DRM level common but extensible API would be doable. Even then it may be > > > tricky to "extract" enough common code to justify it. > > > > > > Regards, > > > > > > Tvrtko > > > > > > > > > > > Cheers, Sima > > > > > > > > > > > > > > > > > > Steve > > > > > > > > > > > Adrián Larumbe (1): > > > > > > drm/panfrost: Always record job cycle and timestamp information > > > > > > > > > > > > drivers/gpu/drm/panfrost/Makefile | 2 -- > > > > > > drivers/gpu/drm/panfrost/panfrost_debugfs.c | 21 ------------------ > > > > > > drivers/gpu/drm/panfrost/panfrost_debugfs.h | 14 ------------ > > > > > > drivers/gpu/drm/panfrost/panfrost_device.h | 1 - > > > > > > drivers/gpu/drm/panfrost/panfrost_drv.c | 5 ----- > > > > > > drivers/gpu/drm/panfrost/panfrost_job.c | 24 ++++++++------------- > > > > > > drivers/gpu/drm/panfrost/panfrost_job.h | 1 - > > > > > > 7 files changed, 9 insertions(+), 59 deletions(-) > > > > > > delete mode 100644 drivers/gpu/drm/panfrost/panfrost_debugfs.c > > > > > > delete mode 100644 drivers/gpu/drm/panfrost/panfrost_debugfs.h > > > > > > > > > > > > > > > > > > base-commit: 6b1f93ea345947c94bf3a7a6e668a2acfd310918 > > > > > > > > > > > > > -- > > > > Daniel Vetter > > > > Software Engineer, Intel Corporation > > > > http://blog.ffwll.ch