On Mon, Dec 16, 2024 at 8:59 AM Connor Abbott <cwabbott0@xxxxxxxxx> wrote: > > On Mon, Dec 16, 2024 at 11:55 AM Akhil P Oommen > <quic_akhilpo@xxxxxxxxxxx> wrote: > > > > On 12/13/2024 10:40 PM, Antonino Maniscalco wrote: > > > On 12/13/24 5:50 PM, Akhil P Oommen wrote: > > >> On 12/12/2024 9:44 PM, Antonino Maniscalco wrote: > > >>> On 12/12/24 4:58 PM, Akhil P Oommen wrote: > > >>>> On 12/5/2024 10:24 PM, Rob Clark wrote: > > >>>>> From: Rob Clark <robdclark@xxxxxxxxxxxx> > > >>>>> > > >>>>> Performance counter usage falls into two categories: > > >>>>> > > >>>>> 1. Local usage, where the counter configuration, start, and end read > > >>>>> happen within (locally to) a single SUBMIT. In this case, > > >>>>> there is > > >>>>> no dependency on counter configuration or values between submits, > > >>>>> and > > >>>>> in fact counters are normally cleared on context switches, > > >>>>> making it > > >>>>> impossible to rely on cross-submit state. > > >>>>> > > >>>>> 2. Global usage, where a single privilaged daemon/process is sampling > > >>>>> counter values across all processes for profiling. > > >>>>> > > >>>>> The two categories are mutually exclusive. While you can have many > > >>>>> processes making local counter usage, you cannot combine global and > > >>>>> local usage without the two stepping on each others feet (by changing > > >>>>> counter configuration). > > >>>>> > > >>>>> For global counter usage, there is already a SYSPROF param (since > > >>>>> global > > >>>>> counter usage requires disabling counter clearing on context switch). > > >>>>> This patch adds a REQ_CNTRS param to request local counter usage. If > > >>>>> one or more processes has requested counter usage, then a SYSPROF > > >>>>> request will fail with -EBUSY. And if SYSPROF is active, then > > >>>>> REQ_CNTRS > > >>>>> will fail with -EBUSY, maintaining the mutual exclusivity. > > >>>>> > > >>>>> This is purely an advisory interface to help coordinate userspace. > > >>>>> There is no real means of enforcement, but the worst that can > > >>>>> happen if > > >>>>> userspace ignores a REQ_CNTRS failure is that you'll get nonsense > > >>>>> profiling data. > > >>>>> > > >>>>> Signed-off-by: Rob Clark <robdclark@xxxxxxxxxxxx> > > >>>>> --- > > >>>>> kgsl takes a different approach, which involves a lot more UABI for > > >>>>> assigning counters to different processes. But I think by taking > > >>>>> advantage of the fact that mesa (freedreno+turnip) reconfigure the > > >>>>> counters they need in each SUBMIT, for their respective gl/vk perf- > > >>>>> counter extensions, we can take this simpler approach. > > >>>> > > >>>> KGSL's approach is preemption and ifpc safe (also whatever HW changes > > >>>> that will come up in future generations). How will we ensure that here? > > >>>> > > >>>> I have plans to bring up IFPC support in near future. Also, I > > >>>> brought up > > >>>> this point during preemption series. But from the responses, I felt > > >>>> that > > >>>> profiling was not considered a serious usecase. Still I wonder how the > > >>>> perfcounter extensions work accurately with preemption. > > >>> > > >>> So back then I implemented the postamble IB to clear perf counters and > > >>> that gets disabled when sysprof (so global usage) is happening. The > > >>> kernel is oblivious to "Local isage" of profiling but in that case > > >>> really what we want to do is disable preemption which in my > > >>> understanding can be done from userspace with a PKT. In my understanding > > >>> this had us covered for all usecases. > > >> > > >> I think this wasn't mentioned at that time. Which UMD PKT in a6x+ did > > >> you mean? > > > > > > Ah, I thought it wasmentioned, sorry. > > > The packet I was referring to is: > > > <doc> Make next dword 1 to disable preemption, 0 to re-enable it. </ > > > doc> > > > <value name="CP_PREEMPT_DISABLE" value="0x6c" variants="A6XX"/> > > > > Ah! Okay. I think this packet is not used by the downstream blob. IMO, > > disabling preemption is still a suboptimal solution. > > Downstream doesn't expose userspace perfcounters (i.e. > VK_KHR_performance_query) at all. My understanding is that Android > requires you not to expose them for security reasons, but I could be > wrong there. In any case, Qualcomm clearly hasn't really thought > through what it would take to make everything work well with userspace > perfcounters and hasn't implemented the necessary firmware bits for > that, so we're left muddling through and doing what we can. That is correct, VK_KHR_performance_query is disallowed on android. There is an android CTS test for that. BR, -R > > Connor > > > > > > > > > BTW you mentioned wanting to look into IFPC. Since I too wanted to look > > > into implementing it wonder if you could let me know when you planned on > > > working on it. > > > > I have few patches in progress. Nothing final yet and need verification > > on the hw side. Also, I need to do some housekeeping here to debug gmu > > issues since IFPC increases the probability of those a lot. > > > > I will try to send out the patches very soon. > > > > -Akhil. > > > > > > > >> > > >> -Akhil. > > >> > > >>> > > >>> So what would you expect instead we should do kernel side to make > > >>> profiling preemption safe? > > >>> > > >>>> > > >>>> -Akhil > > >>>> > > >>>>> > > >>>>> drivers/gpu/drm/msm/adreno/adreno_gpu.c | 2 + > > >>>>> drivers/gpu/drm/msm/msm_drv.c | 5 ++- > > >>>>> drivers/gpu/drm/msm/msm_gpu.c | 1 + > > >>>>> drivers/gpu/drm/msm/msm_gpu.h | 29 +++++++++++++- > > >>>>> drivers/gpu/drm/msm/msm_submitqueue.c | 52 +++++++++++++++++++ > > >>>>> +++++- > > >>>>> include/uapi/drm/msm_drm.h | 1 + > > >>>>> 6 files changed, 85 insertions(+), 5 deletions(-) > > >>>>> > > >>>>> diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c b/drivers/gpu/ > > >>>>> drm/msm/adreno/adreno_gpu.c > > >>>>> index 31bbf2c83de4..f688e37059b8 100644 > > >>>>> --- a/drivers/gpu/drm/msm/adreno/adreno_gpu.c > > >>>>> +++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.c > > >>>>> @@ -441,6 +441,8 @@ int adreno_set_param(struct msm_gpu *gpu, struct > > >>>>> msm_file_private *ctx, > > >>>>> if (!capable(CAP_SYS_ADMIN)) > > >>>>> return UERR(EPERM, drm, "invalid permissions"); > > >>>>> return msm_file_private_set_sysprof(ctx, gpu, value); > > >>>>> + case MSM_PARAM_REQ_CNTRS: > > >>>>> + return msm_file_private_request_counters(ctx, gpu, value); > > >>>>> default: > > >>>>> return UERR(EINVAL, drm, "%s: invalid param: %u", gpu- > > >>>>>> name, param); > > >>>>> } > > >>>>> diff --git a/drivers/gpu/drm/msm/msm_drv.c b/drivers/gpu/drm/msm/ > > >>>>> msm_drv.c > > >>>>> index 6416d2cb4efc..bf8314ff4a25 100644 > > >>>>> --- a/drivers/gpu/drm/msm/msm_drv.c > > >>>>> +++ b/drivers/gpu/drm/msm/msm_drv.c > > >>>>> @@ -377,9 +377,12 @@ static void msm_postclose(struct drm_device > > >>>>> *dev, struct drm_file *file) > > >>>>> * It is not possible to set sysprof param to non-zero if gpu > > >>>>> * is not initialized: > > >>>>> */ > > >>>>> - if (priv->gpu) > > >>>>> + if (ctx->sysprof) > > >>>>> msm_file_private_set_sysprof(ctx, priv->gpu, 0); > > >>>>> + if (ctx->counters_requested) > > >>>>> + msm_file_private_request_counters(ctx, priv->gpu, 0); > > >>>>> + > > >>>>> context_close(ctx); > > >>>>> } > > >>>>> diff --git a/drivers/gpu/drm/msm/msm_gpu.c b/drivers/gpu/drm/msm/ > > >>>>> msm_gpu.c > > >>>>> index 82f204f3bb8f..013b59ca3bb1 100644 > > >>>>> --- a/drivers/gpu/drm/msm/msm_gpu.c > > >>>>> +++ b/drivers/gpu/drm/msm/msm_gpu.c > > >>>>> @@ -991,6 +991,7 @@ int msm_gpu_init(struct drm_device *drm, struct > > >>>>> platform_device *pdev, > > >>>>> gpu->nr_rings = nr_rings; > > >>>>> refcount_set(&gpu->sysprof_active, 1); > > >>>>> + refcount_set(&gpu->local_counters_active, 1); > > >>>>> return 0; > > >>>>> diff --git a/drivers/gpu/drm/msm/msm_gpu.h b/drivers/gpu/drm/msm/ > > >>>>> msm_gpu.h > > >>>>> index e25009150579..83c61e523b1b 100644 > > >>>>> --- a/drivers/gpu/drm/msm/msm_gpu.h > > >>>>> +++ b/drivers/gpu/drm/msm/msm_gpu.h > > >>>>> @@ -195,12 +195,28 @@ struct msm_gpu { > > >>>>> int nr_rings; > > >>>>> /** > > >>>>> - * sysprof_active: > > >>>>> + * @sysprof_active: > > >>>>> * > > >>>>> - * The count of contexts that have enabled system profiling. > > >>>>> + * The count of contexts that have enabled system profiling plus > > >>>>> one. > > >>>>> + * > > >>>>> + * Note: refcount_t does not like 0->1 transitions.. we want to > > >>>>> keep > > >>>>> + * the under/overflow checks that refcount_t provides, but allow > > >>>>> + * multiple on/off transitions so we track the logical value > > >>>>> plus one.) > > >>>>> */ > > >>>>> refcount_t sysprof_active; > > >>>>> + /** > > >>>>> + * @local_counters_active: > > >>>>> + * > > >>>>> + * The count of contexts that have requested local (intra-submit) > > >>>>> + * performance counter usage plus one. > > >>>>> + * > > >>>>> + * Note: refcount_t does not like 0->1 transitions.. we want to > > >>>>> keep > > >>>>> + * the under/overflow checks that refcount_t provides, but allow > > >>>>> + * multiple on/off transitions so we track the logical value > > >>>>> plus one.) > > >>>>> + */ > > >>>>> + refcount_t local_counters_active; > > >>>>> + > > >>>>> /** > > >>>>> * lock: > > >>>>> * > > >>>>> @@ -383,6 +399,13 @@ struct msm_file_private { > > >>>>> */ > > >>>>> int sysprof; > > >>>>> + /** > > >>>>> + * @counters_requested: > > >>>>> + * > > >>>>> + * Has the context requested local perfcntr usage. > > >>>>> + */ > > >>>>> + bool counters_requested; > > >>>>> + > > >>>>> /** > > >>>>> * comm: Overridden task comm, see MSM_PARAM_COMM > > >>>>> * > > >>>>> @@ -626,6 +649,8 @@ void msm_submitqueue_destroy(struct kref *kref); > > >>>>> int msm_file_private_set_sysprof(struct msm_file_private *ctx, > > >>>>> struct msm_gpu *gpu, int sysprof); > > >>>>> +int msm_file_private_request_counters(struct msm_file_private *ctx, > > >>>>> + struct msm_gpu *gpu, int reqcntrs); > > >>>>> void __msm_file_private_destroy(struct kref *kref); > > >>>>> static inline void msm_file_private_put(struct msm_file_private > > >>>>> *ctx) > > >>>>> diff --git a/drivers/gpu/drm/msm/msm_submitqueue.c b/drivers/gpu/drm/ > > >>>>> msm/msm_submitqueue.c > > >>>>> index 7fed1de63b5d..1e1e21e6f7ae 100644 > > >>>>> --- a/drivers/gpu/drm/msm/msm_submitqueue.c > > >>>>> +++ b/drivers/gpu/drm/msm/msm_submitqueue.c > > >>>>> @@ -10,6 +10,15 @@ > > >>>>> int msm_file_private_set_sysprof(struct msm_file_private *ctx, > > >>>>> struct msm_gpu *gpu, int sysprof) > > >>>>> { > > >>>>> + int ret = 0; > > >>>>> + > > >>>>> + mutex_lock(&gpu->lock); > > >>>>> + > > >>>>> + if (sysprof && (refcount_read(&gpu->local_counters_active) > > > >>>>> 1)) { > > >>>>> + ret = UERR(EBUSY, gpu->dev, "Local counter usage active"); > > >>>>> + goto out_unlock; > > >>>>> + } > > >>>>> + > > >>>>> /* > > >>>>> * Since pm_runtime and sysprof_active are both refcounts, we > > >>>>> * call apply the new value first, and then unwind the previous > > >>>>> @@ -18,7 +27,8 @@ int msm_file_private_set_sysprof(struct > > >>>>> msm_file_private *ctx, > > >>>>> switch (sysprof) { > > >>>>> default: > > >>>>> - return UERR(EINVAL, gpu->dev, "Invalid sysprof: %d", > > >>>>> sysprof); > > >>>>> + ret = UERR(EINVAL, gpu->dev, "Invalid sysprof: %d", sysprof); > > >>>>> + goto out_unlock; > > >>>>> case 2: > > >>>>> pm_runtime_get_sync(&gpu->pdev->dev); > > >>>>> fallthrough; > > >>>>> @@ -43,7 +53,45 @@ int msm_file_private_set_sysprof(struct > > >>>>> msm_file_private *ctx, > > >>>>> ctx->sysprof = sysprof; > > >>>>> - return 0; > > >>>>> +out_unlock: > > >>>>> + mutex_unlock(&gpu->lock); > > >>>>> + > > >>>>> + return ret; > > >>>>> +} > > >>>>> + > > >>>>> +int msm_file_private_request_counters(struct msm_file_private *ctx, > > >>>>> + struct msm_gpu *gpu, int reqctrs) > > >>>>> +{ > > >>>>> + int ret = 0; > > >>>>> + > > >>>>> + mutex_lock(&gpu->lock); > > >>>>> + > > >>>>> + if (reqctrs && (refcount_read(&gpu->sysprof_active) > 1)) { > > >>>>> + ret = UERR(EBUSY, gpu->dev, "System profiling active"); > > >>>>> + goto out_unlock; > > >>>>> + } > > >>>>> + > > >>>>> + if (reqctrs) { > > >>>>> + if (ctx->counters_requested) { > > >>>>> + ret = UERR(EINVAL, gpu->dev, "Already requested"); > > >>>>> + goto out_unlock; > > >>>>> + } > > >>>>> + > > >>>>> + ctx->counters_requested = true; > > >>>>> + refcount_inc(&gpu->local_counters_active); > > >>>>> + } else { > > >>>>> + if (!ctx->counters_requested) { > > >>>>> + ret = UERR(EINVAL, gpu->dev, "Not requested"); > > >>>>> + goto out_unlock; > > >>>>> + } > > >>>>> + refcount_dec(&gpu->local_counters_active); > > >>>>> + ctx->counters_requested = false; > > >>>>> + } > > >>>>> + > > >>>>> +out_unlock: > > >>>>> + mutex_unlock(&gpu->lock); > > >>>>> + > > >>>>> + return ret; > > >>>>> } > > >>>>> void __msm_file_private_destroy(struct kref *kref) > > >>>>> diff --git a/include/uapi/drm/msm_drm.h b/include/uapi/drm/msm_drm.h > > >>>>> index 2342cb90857e..ae7fb355e4a1 100644 > > >>>>> --- a/include/uapi/drm/msm_drm.h > > >>>>> +++ b/include/uapi/drm/msm_drm.h > > >>>>> @@ -91,6 +91,7 @@ struct drm_msm_timespec { > > >>>>> #define MSM_PARAM_UBWC_SWIZZLE 0x12 /* RO */ > > >>>>> #define MSM_PARAM_MACROTILE_MODE 0x13 /* RO */ > > >>>>> #define MSM_PARAM_UCHE_TRAP_BASE 0x14 /* RO */ > > >>>>> +#define MSM_PARAM_REQ_CNTRS 0x15 /* WO: request "local" (intra- > > >>>>> submit) perfcntr usage */ > > >>>>> /* For backwards compat. The original support for preemption was > > >>>>> based on > > >>>>> * a single ring per priority level so # of priority levels equals > > >>>>> the # > > >>>> > > >>> > > >>> > > >>> Best regards, > > >> > > > > > > > > > Best regards, > >