On 12/13/2024 10:40 PM, Antonino Maniscalco wrote: > On 12/13/24 5:50 PM, Akhil P Oommen wrote: >> On 12/12/2024 9:44 PM, Antonino Maniscalco wrote: >>> On 12/12/24 4:58 PM, Akhil P Oommen wrote: >>>> On 12/5/2024 10:24 PM, Rob Clark wrote: >>>>> From: Rob Clark <robdclark@xxxxxxxxxxxx> >>>>> >>>>> Performance counter usage falls into two categories: >>>>> >>>>> 1. Local usage, where the counter configuration, start, and end read >>>>> happen within (locally to) a single SUBMIT. In this case, >>>>> there is >>>>> no dependency on counter configuration or values between submits, >>>>> and >>>>> in fact counters are normally cleared on context switches, >>>>> making it >>>>> impossible to rely on cross-submit state. >>>>> >>>>> 2. Global usage, where a single privilaged daemon/process is sampling >>>>> counter values across all processes for profiling. >>>>> >>>>> The two categories are mutually exclusive. While you can have many >>>>> processes making local counter usage, you cannot combine global and >>>>> local usage without the two stepping on each others feet (by changing >>>>> counter configuration). >>>>> >>>>> For global counter usage, there is already a SYSPROF param (since >>>>> global >>>>> counter usage requires disabling counter clearing on context switch). >>>>> This patch adds a REQ_CNTRS param to request local counter usage. If >>>>> one or more processes has requested counter usage, then a SYSPROF >>>>> request will fail with -EBUSY. And if SYSPROF is active, then >>>>> REQ_CNTRS >>>>> will fail with -EBUSY, maintaining the mutual exclusivity. >>>>> >>>>> This is purely an advisory interface to help coordinate userspace. >>>>> There is no real means of enforcement, but the worst that can >>>>> happen if >>>>> userspace ignores a REQ_CNTRS failure is that you'll get nonsense >>>>> profiling data. >>>>> >>>>> Signed-off-by: Rob Clark <robdclark@xxxxxxxxxxxx> >>>>> --- >>>>> kgsl takes a different approach, which involves a lot more UABI for >>>>> assigning counters to different processes. But I think by taking >>>>> advantage of the fact that mesa (freedreno+turnip) reconfigure the >>>>> counters they need in each SUBMIT, for their respective gl/vk perf- >>>>> counter extensions, we can take this simpler approach. >>>> >>>> KGSL's approach is preemption and ifpc safe (also whatever HW changes >>>> that will come up in future generations). How will we ensure that here? >>>> >>>> I have plans to bring up IFPC support in near future. Also, I >>>> brought up >>>> this point during preemption series. But from the responses, I felt >>>> that >>>> profiling was not considered a serious usecase. Still I wonder how the >>>> perfcounter extensions work accurately with preemption. >>> >>> So back then I implemented the postamble IB to clear perf counters and >>> that gets disabled when sysprof (so global usage) is happening. The >>> kernel is oblivious to "Local isage" of profiling but in that case >>> really what we want to do is disable preemption which in my >>> understanding can be done from userspace with a PKT. In my understanding >>> this had us covered for all usecases. >> >> I think this wasn't mentioned at that time. Which UMD PKT in a6x+ did >> you mean? > > Ah, I thought it wasmentioned, sorry. > The packet I was referring to is: > <doc> Make next dword 1 to disable preemption, 0 to re-enable it. </ > doc> > <value name="CP_PREEMPT_DISABLE" value="0x6c" variants="A6XX"/> Ah! Okay. I think this packet is not used by the downstream blob. IMO, disabling preemption is still a suboptimal solution. > > BTW you mentioned wanting to look into IFPC. Since I too wanted to look > into implementing it wonder if you could let me know when you planned on > working on it. I have few patches in progress. Nothing final yet and need verification on the hw side. Also, I need to do some housekeeping here to debug gmu issues since IFPC increases the probability of those a lot. I will try to send out the patches very soon. -Akhil. > >> >> -Akhil. >> >>> >>> So what would you expect instead we should do kernel side to make >>> profiling preemption safe? >>> >>>> >>>> -Akhil >>>> >>>>> >>>>> drivers/gpu/drm/msm/adreno/adreno_gpu.c | 2 + >>>>> drivers/gpu/drm/msm/msm_drv.c | 5 ++- >>>>> drivers/gpu/drm/msm/msm_gpu.c | 1 + >>>>> drivers/gpu/drm/msm/msm_gpu.h | 29 +++++++++++++- >>>>> drivers/gpu/drm/msm/msm_submitqueue.c | 52 +++++++++++++++++++ >>>>> +++++- >>>>> include/uapi/drm/msm_drm.h | 1 + >>>>> 6 files changed, 85 insertions(+), 5 deletions(-) >>>>> >>>>> diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c b/drivers/gpu/ >>>>> drm/msm/adreno/adreno_gpu.c >>>>> index 31bbf2c83de4..f688e37059b8 100644 >>>>> --- a/drivers/gpu/drm/msm/adreno/adreno_gpu.c >>>>> +++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.c >>>>> @@ -441,6 +441,8 @@ int adreno_set_param(struct msm_gpu *gpu, struct >>>>> msm_file_private *ctx, >>>>> if (!capable(CAP_SYS_ADMIN)) >>>>> return UERR(EPERM, drm, "invalid permissions"); >>>>> return msm_file_private_set_sysprof(ctx, gpu, value); >>>>> + case MSM_PARAM_REQ_CNTRS: >>>>> + return msm_file_private_request_counters(ctx, gpu, value); >>>>> default: >>>>> return UERR(EINVAL, drm, "%s: invalid param: %u", gpu- >>>>>> name, param); >>>>> } >>>>> diff --git a/drivers/gpu/drm/msm/msm_drv.c b/drivers/gpu/drm/msm/ >>>>> msm_drv.c >>>>> index 6416d2cb4efc..bf8314ff4a25 100644 >>>>> --- a/drivers/gpu/drm/msm/msm_drv.c >>>>> +++ b/drivers/gpu/drm/msm/msm_drv.c >>>>> @@ -377,9 +377,12 @@ static void msm_postclose(struct drm_device >>>>> *dev, struct drm_file *file) >>>>> * It is not possible to set sysprof param to non-zero if gpu >>>>> * is not initialized: >>>>> */ >>>>> - if (priv->gpu) >>>>> + if (ctx->sysprof) >>>>> msm_file_private_set_sysprof(ctx, priv->gpu, 0); >>>>> + if (ctx->counters_requested) >>>>> + msm_file_private_request_counters(ctx, priv->gpu, 0); >>>>> + >>>>> context_close(ctx); >>>>> } >>>>> diff --git a/drivers/gpu/drm/msm/msm_gpu.c b/drivers/gpu/drm/msm/ >>>>> msm_gpu.c >>>>> index 82f204f3bb8f..013b59ca3bb1 100644 >>>>> --- a/drivers/gpu/drm/msm/msm_gpu.c >>>>> +++ b/drivers/gpu/drm/msm/msm_gpu.c >>>>> @@ -991,6 +991,7 @@ int msm_gpu_init(struct drm_device *drm, struct >>>>> platform_device *pdev, >>>>> gpu->nr_rings = nr_rings; >>>>> refcount_set(&gpu->sysprof_active, 1); >>>>> + refcount_set(&gpu->local_counters_active, 1); >>>>> return 0; >>>>> diff --git a/drivers/gpu/drm/msm/msm_gpu.h b/drivers/gpu/drm/msm/ >>>>> msm_gpu.h >>>>> index e25009150579..83c61e523b1b 100644 >>>>> --- a/drivers/gpu/drm/msm/msm_gpu.h >>>>> +++ b/drivers/gpu/drm/msm/msm_gpu.h >>>>> @@ -195,12 +195,28 @@ struct msm_gpu { >>>>> int nr_rings; >>>>> /** >>>>> - * sysprof_active: >>>>> + * @sysprof_active: >>>>> * >>>>> - * The count of contexts that have enabled system profiling. >>>>> + * The count of contexts that have enabled system profiling plus >>>>> one. >>>>> + * >>>>> + * Note: refcount_t does not like 0->1 transitions.. we want to >>>>> keep >>>>> + * the under/overflow checks that refcount_t provides, but allow >>>>> + * multiple on/off transitions so we track the logical value >>>>> plus one.) >>>>> */ >>>>> refcount_t sysprof_active; >>>>> + /** >>>>> + * @local_counters_active: >>>>> + * >>>>> + * The count of contexts that have requested local (intra-submit) >>>>> + * performance counter usage plus one. >>>>> + * >>>>> + * Note: refcount_t does not like 0->1 transitions.. we want to >>>>> keep >>>>> + * the under/overflow checks that refcount_t provides, but allow >>>>> + * multiple on/off transitions so we track the logical value >>>>> plus one.) >>>>> + */ >>>>> + refcount_t local_counters_active; >>>>> + >>>>> /** >>>>> * lock: >>>>> * >>>>> @@ -383,6 +399,13 @@ struct msm_file_private { >>>>> */ >>>>> int sysprof; >>>>> + /** >>>>> + * @counters_requested: >>>>> + * >>>>> + * Has the context requested local perfcntr usage. >>>>> + */ >>>>> + bool counters_requested; >>>>> + >>>>> /** >>>>> * comm: Overridden task comm, see MSM_PARAM_COMM >>>>> * >>>>> @@ -626,6 +649,8 @@ void msm_submitqueue_destroy(struct kref *kref); >>>>> int msm_file_private_set_sysprof(struct msm_file_private *ctx, >>>>> struct msm_gpu *gpu, int sysprof); >>>>> +int msm_file_private_request_counters(struct msm_file_private *ctx, >>>>> + struct msm_gpu *gpu, int reqcntrs); >>>>> void __msm_file_private_destroy(struct kref *kref); >>>>> static inline void msm_file_private_put(struct msm_file_private >>>>> *ctx) >>>>> diff --git a/drivers/gpu/drm/msm/msm_submitqueue.c b/drivers/gpu/drm/ >>>>> msm/msm_submitqueue.c >>>>> index 7fed1de63b5d..1e1e21e6f7ae 100644 >>>>> --- a/drivers/gpu/drm/msm/msm_submitqueue.c >>>>> +++ b/drivers/gpu/drm/msm/msm_submitqueue.c >>>>> @@ -10,6 +10,15 @@ >>>>> int msm_file_private_set_sysprof(struct msm_file_private *ctx, >>>>> struct msm_gpu *gpu, int sysprof) >>>>> { >>>>> + int ret = 0; >>>>> + >>>>> + mutex_lock(&gpu->lock); >>>>> + >>>>> + if (sysprof && (refcount_read(&gpu->local_counters_active) > >>>>> 1)) { >>>>> + ret = UERR(EBUSY, gpu->dev, "Local counter usage active"); >>>>> + goto out_unlock; >>>>> + } >>>>> + >>>>> /* >>>>> * Since pm_runtime and sysprof_active are both refcounts, we >>>>> * call apply the new value first, and then unwind the previous >>>>> @@ -18,7 +27,8 @@ int msm_file_private_set_sysprof(struct >>>>> msm_file_private *ctx, >>>>> switch (sysprof) { >>>>> default: >>>>> - return UERR(EINVAL, gpu->dev, "Invalid sysprof: %d", >>>>> sysprof); >>>>> + ret = UERR(EINVAL, gpu->dev, "Invalid sysprof: %d", sysprof); >>>>> + goto out_unlock; >>>>> case 2: >>>>> pm_runtime_get_sync(&gpu->pdev->dev); >>>>> fallthrough; >>>>> @@ -43,7 +53,45 @@ int msm_file_private_set_sysprof(struct >>>>> msm_file_private *ctx, >>>>> ctx->sysprof = sysprof; >>>>> - return 0; >>>>> +out_unlock: >>>>> + mutex_unlock(&gpu->lock); >>>>> + >>>>> + return ret; >>>>> +} >>>>> + >>>>> +int msm_file_private_request_counters(struct msm_file_private *ctx, >>>>> + struct msm_gpu *gpu, int reqctrs) >>>>> +{ >>>>> + int ret = 0; >>>>> + >>>>> + mutex_lock(&gpu->lock); >>>>> + >>>>> + if (reqctrs && (refcount_read(&gpu->sysprof_active) > 1)) { >>>>> + ret = UERR(EBUSY, gpu->dev, "System profiling active"); >>>>> + goto out_unlock; >>>>> + } >>>>> + >>>>> + if (reqctrs) { >>>>> + if (ctx->counters_requested) { >>>>> + ret = UERR(EINVAL, gpu->dev, "Already requested"); >>>>> + goto out_unlock; >>>>> + } >>>>> + >>>>> + ctx->counters_requested = true; >>>>> + refcount_inc(&gpu->local_counters_active); >>>>> + } else { >>>>> + if (!ctx->counters_requested) { >>>>> + ret = UERR(EINVAL, gpu->dev, "Not requested"); >>>>> + goto out_unlock; >>>>> + } >>>>> + refcount_dec(&gpu->local_counters_active); >>>>> + ctx->counters_requested = false; >>>>> + } >>>>> + >>>>> +out_unlock: >>>>> + mutex_unlock(&gpu->lock); >>>>> + >>>>> + return ret; >>>>> } >>>>> void __msm_file_private_destroy(struct kref *kref) >>>>> diff --git a/include/uapi/drm/msm_drm.h b/include/uapi/drm/msm_drm.h >>>>> index 2342cb90857e..ae7fb355e4a1 100644 >>>>> --- a/include/uapi/drm/msm_drm.h >>>>> +++ b/include/uapi/drm/msm_drm.h >>>>> @@ -91,6 +91,7 @@ struct drm_msm_timespec { >>>>> #define MSM_PARAM_UBWC_SWIZZLE 0x12 /* RO */ >>>>> #define MSM_PARAM_MACROTILE_MODE 0x13 /* RO */ >>>>> #define MSM_PARAM_UCHE_TRAP_BASE 0x14 /* RO */ >>>>> +#define MSM_PARAM_REQ_CNTRS 0x15 /* WO: request "local" (intra- >>>>> submit) perfcntr usage */ >>>>> /* For backwards compat. The original support for preemption was >>>>> based on >>>>> * a single ring per priority level so # of priority levels equals >>>>> the # >>>> >>> >>> >>> Best regards, >> > > > Best regards,