On 31/08/2023 22:34, Adrián Larumbe wrote: > On 31.08.2023 16:54, Steven Price wrote: >> On 24/08/2023 02:34, Adrián Larumbe wrote: >>> The drm-stats fdinfo tags made available to user space are drm-engine, >>> drm-cycles, drm-max-freq and drm-curfreq, one per job slot. >>> >>> This deviates from standard practice in other DRM drivers, where a single >>> set of key:value pairs is provided for the whole render engine. However, >>> Panfrost has separate queues for fragment and vertex/tiler jobs, so a >>> decision was made to calculate bus cycles and workload times separately. >>> >>> Maximum operating frequency is calculated at devfreq initialisation time. >>> Current frequency is made available to user space because nvtop uses it >>> when performing engine usage calculations. >>> >>> Signed-off-by: Adrián Larumbe <adrian.larumbe@xxxxxxxxxxxxx> >>> --- >>> drivers/gpu/drm/panfrost/panfrost_devfreq.c | 8 ++++ >>> drivers/gpu/drm/panfrost/panfrost_devfreq.h | 3 ++ >>> drivers/gpu/drm/panfrost/panfrost_device.h | 13 ++++++ >>> drivers/gpu/drm/panfrost/panfrost_drv.c | 45 ++++++++++++++++++++- >>> drivers/gpu/drm/panfrost/panfrost_job.c | 30 ++++++++++++++ >>> drivers/gpu/drm/panfrost/panfrost_job.h | 4 ++ >>> 6 files changed, 102 insertions(+), 1 deletion(-) >>> >> >> [...] >> >>> diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c b/drivers/gpu/drm/panfrost/panfrost_drv.c >>> index a2ab99698ca8..3fd372301019 100644 >>> --- a/drivers/gpu/drm/panfrost/panfrost_drv.c >>> +++ b/drivers/gpu/drm/panfrost/panfrost_drv.c >>> @@ -267,6 +267,7 @@ static int panfrost_ioctl_submit(struct drm_device *dev, void *data, >>> job->requirements = args->requirements; >>> job->flush_id = panfrost_gpu_get_latest_flush_id(pfdev); >>> job->mmu = file_priv->mmu; >>> + job->priv = file_priv; >>> >>> slot = panfrost_job_get_slot(job); >>> >>> @@ -483,6 +484,14 @@ panfrost_open(struct drm_device *dev, struct drm_file *file) >>> goto err_free; >>> } >>> >>> + snprintf(panfrost_priv->fdinfo.engines[0].name, MAX_SLOT_NAME_LEN, "frg"); >>> + snprintf(panfrost_priv->fdinfo.engines[1].name, MAX_SLOT_NAME_LEN, "vtx"); >>> +#if 0 >>> + /* Add compute engine in the future */ >>> + snprintf(panfrost_priv->fdinfo.engines[2].name, MAX_SLOT_NAME_LEN, "cmp"); >>> +#endif >> >> I'm not sure what names are best, but slot 2 isn't actually a compute slot. >> >> Slot 0 is fragment, that name is fine. >> >> Slot 1 and 2 are actually the same (from a hardware perspective) but the >> core affinity of the two slots cannot overlap which means you need to >> divide the GPU in two to usefully use both slots. The only GPU that this >> actually makes sense for is the T628[1] as it has two (non-coherent) >> core groups. >> >> The upshot is that slot 1 is used for all of vertex, tiling and compute. >> Slot 2 is currently never used, but kbase will use it only for compute >> (and only on the two core group GPUs). > > I think I might've be rushed to draw inspiration for this from a comment in panfrost_job.c: > > int panfrost_job_get_slot(struct panfrost_job *job) > { > /* JS0: fragment jobs. > * JS1: vertex/tiler jobs > * JS2: compute jobs > */ > [...] > } > > Maybe I could rename the engine names to "fragment", "vertex-tiler" and "compute-only"? > There's no reason why I would skimp on engine name length, and anything more > descriptive would be just as good. Yeah, those names are probably the best we're going to get. And I certainly prefer the longer names. >> Personally I'd be tempted to call them "slot 0", "slot 1" and "slot 2" - >> but I appreciate that's not very helpful to people who aren't intimately >> familiar with the hardware ;) > > The downside of this is that both IGT's fdinfo library and nvtop will use the > engime name for display, and like you said these numbers might mean nothing to > someone who isn't acquainted with the hardware. Indeed - I've spent way too much time with the hardware and there are many subtleties so I tent to try to avoid calling them anything other than "slot x" (especially when talking to hardware engineers). For example a test that submits NULL jobs can submit them to any slot. However, when you get beyond artificial tests then it is quite consistent that slot 0=fragment, slot 1=vertex-tiler (and compute), slot 2=never used (except for compute on dual core groups). Steve >> Steve >> >> [1] And technically the T608 but that's even rarer and the T60x isn't >> (yet) supported by Panfrost.