On Thu, Dec 12, 2024 at 9:08 AM Rob Clark <robdclark@xxxxxxxxx> wrote: > > On Thu, Dec 12, 2024 at 7:59 AM Akhil P Oommen <quic_akhilpo@xxxxxxxxxxx> wrote: > > > > On 12/5/2024 10:24 PM, Rob Clark wrote: > > > From: Rob Clark <robdclark@xxxxxxxxxxxx> > > > > > > Performance counter usage falls into two categories: > > > > > > 1. Local usage, where the counter configuration, start, and end read > > > happen within (locally to) a single SUBMIT. In this case, there is > > > no dependency on counter configuration or values between submits, and > > > in fact counters are normally cleared on context switches, making it > > > impossible to rely on cross-submit state. > > > > > > 2. Global usage, where a single privilaged daemon/process is sampling > > > counter values across all processes for profiling. > > > > > > The two categories are mutually exclusive. While you can have many > > > processes making local counter usage, you cannot combine global and > > > local usage without the two stepping on each others feet (by changing > > > counter configuration). > > > > > > For global counter usage, there is already a SYSPROF param (since global > > > counter usage requires disabling counter clearing on context switch). > > > This patch adds a REQ_CNTRS param to request local counter usage. If > > > one or more processes has requested counter usage, then a SYSPROF > > > request will fail with -EBUSY. And if SYSPROF is active, then REQ_CNTRS > > > will fail with -EBUSY, maintaining the mutual exclusivity. > > > > > > This is purely an advisory interface to help coordinate userspace. > > > There is no real means of enforcement, but the worst that can happen if > > > userspace ignores a REQ_CNTRS failure is that you'll get nonsense > > > profiling data. > > > > > > Signed-off-by: Rob Clark <robdclark@xxxxxxxxxxxx> > > > --- > > > kgsl takes a different approach, which involves a lot more UABI for > > > assigning counters to different processes. But I think by taking > > > advantage of the fact that mesa (freedreno+turnip) reconfigure the > > > counters they need in each SUBMIT, for their respective gl/vk perf- > > > counter extensions, we can take this simpler approach. > > > > KGSL's approach is preemption and ifpc safe (also whatever HW changes > > that will come up in future generations). How will we ensure that here? > > > > I have plans to bring up IFPC support in near future. Also, I brought up > > this point during preemption series. But from the responses, I felt that > > profiling was not considered a serious usecase. Still I wonder how the > > perfcounter extensions work accurately with preemption. > > Re: IFPC, I think initially we have to inhibit IFPC when SYSPROF is active > > Longer term, I think we want to just save and restore all of the SEL > regs as well as the counters themselves on preemption. AFAIU > currently only the counters themselves are saved/restored. But there > is only one 32b SEL reg for each 64b counter, so I'm not sure that you > save that many cycles by not just saving/restoring the SEL regs as > well. (And of course with REQ_CNTRS the kernel knows which processes > need counter save/restore and which do not, so you are only taking the > extra context switch overhead if a process is actually using the > perfcntrs.) Actually I'm maybe blending two different, but similar cases. PREAMBLE/POSTAMBLE, I think, cover us for preemption For IFPC, we'd need a way to tell GMU that SYSPROF is active, so it could save/restore all the counters and selectors (IFPC shouldn't matter for local profiling / REQ_CNTRS case, since you wouldn't go into IFPC mid-submit.) BR, -R > Alternatively, I think we could just declare this as a userspace > problem, and solve it with CP_SET_AMBLE PREAMBLE/POSTAMBLE? > > Just for background, rendernode UABI is exposed to all processes that > can use the GPU, ie. basically everything. Which makes it an > attractive attack surface. This is why I prefer minimalism when it > comes to UABI, and not adding new ioctls and complexity in the kernel > when it is not essential ;-) > > BR, > -R > > > -Akhil > > > > > > > > drivers/gpu/drm/msm/adreno/adreno_gpu.c | 2 + > > > drivers/gpu/drm/msm/msm_drv.c | 5 ++- > > > drivers/gpu/drm/msm/msm_gpu.c | 1 + > > > drivers/gpu/drm/msm/msm_gpu.h | 29 +++++++++++++- > > > drivers/gpu/drm/msm/msm_submitqueue.c | 52 ++++++++++++++++++++++++- > > > include/uapi/drm/msm_drm.h | 1 + > > > 6 files changed, 85 insertions(+), 5 deletions(-) > > > > > > diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c b/drivers/gpu/drm/msm/adreno/adreno_gpu.c > > > index 31bbf2c83de4..f688e37059b8 100644 > > > --- a/drivers/gpu/drm/msm/adreno/adreno_gpu.c > > > +++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.c > > > @@ -441,6 +441,8 @@ int adreno_set_param(struct msm_gpu *gpu, struct msm_file_private *ctx, > > > if (!capable(CAP_SYS_ADMIN)) > > > return UERR(EPERM, drm, "invalid permissions"); > > > return msm_file_private_set_sysprof(ctx, gpu, value); > > > + case MSM_PARAM_REQ_CNTRS: > > > + return msm_file_private_request_counters(ctx, gpu, value); > > > default: > > > return UERR(EINVAL, drm, "%s: invalid param: %u", gpu->name, param); > > > } > > > diff --git a/drivers/gpu/drm/msm/msm_drv.c b/drivers/gpu/drm/msm/msm_drv.c > > > index 6416d2cb4efc..bf8314ff4a25 100644 > > > --- a/drivers/gpu/drm/msm/msm_drv.c > > > +++ b/drivers/gpu/drm/msm/msm_drv.c > > > @@ -377,9 +377,12 @@ static void msm_postclose(struct drm_device *dev, struct drm_file *file) > > > * It is not possible to set sysprof param to non-zero if gpu > > > * is not initialized: > > > */ > > > - if (priv->gpu) > > > + if (ctx->sysprof) > > > msm_file_private_set_sysprof(ctx, priv->gpu, 0); > > > > > > + if (ctx->counters_requested) > > > + msm_file_private_request_counters(ctx, priv->gpu, 0); > > > + > > > context_close(ctx); > > > } > > > > > > diff --git a/drivers/gpu/drm/msm/msm_gpu.c b/drivers/gpu/drm/msm/msm_gpu.c > > > index 82f204f3bb8f..013b59ca3bb1 100644 > > > --- a/drivers/gpu/drm/msm/msm_gpu.c > > > +++ b/drivers/gpu/drm/msm/msm_gpu.c > > > @@ -991,6 +991,7 @@ int msm_gpu_init(struct drm_device *drm, struct platform_device *pdev, > > > gpu->nr_rings = nr_rings; > > > > > > refcount_set(&gpu->sysprof_active, 1); > > > + refcount_set(&gpu->local_counters_active, 1); > > > > > > return 0; > > > > > > diff --git a/drivers/gpu/drm/msm/msm_gpu.h b/drivers/gpu/drm/msm/msm_gpu.h > > > index e25009150579..83c61e523b1b 100644 > > > --- a/drivers/gpu/drm/msm/msm_gpu.h > > > +++ b/drivers/gpu/drm/msm/msm_gpu.h > > > @@ -195,12 +195,28 @@ struct msm_gpu { > > > int nr_rings; > > > > > > /** > > > - * sysprof_active: > > > + * @sysprof_active: > > > * > > > - * The count of contexts that have enabled system profiling. > > > + * The count of contexts that have enabled system profiling plus one. > > > + * > > > + * Note: refcount_t does not like 0->1 transitions.. we want to keep > > > + * the under/overflow checks that refcount_t provides, but allow > > > + * multiple on/off transitions so we track the logical value plus one.) > > > */ > > > refcount_t sysprof_active; > > > > > > + /** > > > + * @local_counters_active: > > > + * > > > + * The count of contexts that have requested local (intra-submit) > > > + * performance counter usage plus one. > > > + * > > > + * Note: refcount_t does not like 0->1 transitions.. we want to keep > > > + * the under/overflow checks that refcount_t provides, but allow > > > + * multiple on/off transitions so we track the logical value plus one.) > > > + */ > > > + refcount_t local_counters_active; > > > + > > > /** > > > * lock: > > > * > > > @@ -383,6 +399,13 @@ struct msm_file_private { > > > */ > > > int sysprof; > > > > > > + /** > > > + * @counters_requested: > > > + * > > > + * Has the context requested local perfcntr usage. > > > + */ > > > + bool counters_requested; > > > + > > > /** > > > * comm: Overridden task comm, see MSM_PARAM_COMM > > > * > > > @@ -626,6 +649,8 @@ void msm_submitqueue_destroy(struct kref *kref); > > > > > > int msm_file_private_set_sysprof(struct msm_file_private *ctx, > > > struct msm_gpu *gpu, int sysprof); > > > +int msm_file_private_request_counters(struct msm_file_private *ctx, > > > + struct msm_gpu *gpu, int reqcntrs); > > > void __msm_file_private_destroy(struct kref *kref); > > > > > > static inline void msm_file_private_put(struct msm_file_private *ctx) > > > diff --git a/drivers/gpu/drm/msm/msm_submitqueue.c b/drivers/gpu/drm/msm/msm_submitqueue.c > > > index 7fed1de63b5d..1e1e21e6f7ae 100644 > > > --- a/drivers/gpu/drm/msm/msm_submitqueue.c > > > +++ b/drivers/gpu/drm/msm/msm_submitqueue.c > > > @@ -10,6 +10,15 @@ > > > int msm_file_private_set_sysprof(struct msm_file_private *ctx, > > > struct msm_gpu *gpu, int sysprof) > > > { > > > + int ret = 0; > > > + > > > + mutex_lock(&gpu->lock); > > > + > > > + if (sysprof && (refcount_read(&gpu->local_counters_active) > 1)) { > > > + ret = UERR(EBUSY, gpu->dev, "Local counter usage active"); > > > + goto out_unlock; > > > + } > > > + > > > /* > > > * Since pm_runtime and sysprof_active are both refcounts, we > > > * call apply the new value first, and then unwind the previous > > > @@ -18,7 +27,8 @@ int msm_file_private_set_sysprof(struct msm_file_private *ctx, > > > > > > switch (sysprof) { > > > default: > > > - return UERR(EINVAL, gpu->dev, "Invalid sysprof: %d", sysprof); > > > + ret = UERR(EINVAL, gpu->dev, "Invalid sysprof: %d", sysprof); > > > + goto out_unlock; > > > case 2: > > > pm_runtime_get_sync(&gpu->pdev->dev); > > > fallthrough; > > > @@ -43,7 +53,45 @@ int msm_file_private_set_sysprof(struct msm_file_private *ctx, > > > > > > ctx->sysprof = sysprof; > > > > > > - return 0; > > > +out_unlock: > > > + mutex_unlock(&gpu->lock); > > > + > > > + return ret; > > > +} > > > + > > > +int msm_file_private_request_counters(struct msm_file_private *ctx, > > > + struct msm_gpu *gpu, int reqctrs) > > > +{ > > > + int ret = 0; > > > + > > > + mutex_lock(&gpu->lock); > > > + > > > + if (reqctrs && (refcount_read(&gpu->sysprof_active) > 1)) { > > > + ret = UERR(EBUSY, gpu->dev, "System profiling active"); > > > + goto out_unlock; > > > + } > > > + > > > + if (reqctrs) { > > > + if (ctx->counters_requested) { > > > + ret = UERR(EINVAL, gpu->dev, "Already requested"); > > > + goto out_unlock; > > > + } > > > + > > > + ctx->counters_requested = true; > > > + refcount_inc(&gpu->local_counters_active); > > > + } else { > > > + if (!ctx->counters_requested) { > > > + ret = UERR(EINVAL, gpu->dev, "Not requested"); > > > + goto out_unlock; > > > + } > > > + refcount_dec(&gpu->local_counters_active); > > > + ctx->counters_requested = false; > > > + } > > > + > > > +out_unlock: > > > + mutex_unlock(&gpu->lock); > > > + > > > + return ret; > > > } > > > > > > void __msm_file_private_destroy(struct kref *kref) > > > diff --git a/include/uapi/drm/msm_drm.h b/include/uapi/drm/msm_drm.h > > > index 2342cb90857e..ae7fb355e4a1 100644 > > > --- a/include/uapi/drm/msm_drm.h > > > +++ b/include/uapi/drm/msm_drm.h > > > @@ -91,6 +91,7 @@ struct drm_msm_timespec { > > > #define MSM_PARAM_UBWC_SWIZZLE 0x12 /* RO */ > > > #define MSM_PARAM_MACROTILE_MODE 0x13 /* RO */ > > > #define MSM_PARAM_UCHE_TRAP_BASE 0x14 /* RO */ > > > +#define MSM_PARAM_REQ_CNTRS 0x15 /* WO: request "local" (intra-submit) perfcntr usage */ > > > > > > /* For backwards compat. The original support for preemption was based on > > > * a single ring per priority level so # of priority levels equals the # > >