On 12/17/2024 2:21 AM, Rob Clark wrote: > On Mon, Dec 16, 2024 at 12:25 PM Akhil P Oommen > <quic_akhilpo@xxxxxxxxxxx> wrote: >> >> On 12/16/2024 10:28 PM, Connor Abbott wrote: >>> On Mon, Dec 16, 2024 at 11:55 AM Akhil P Oommen >>> <quic_akhilpo@xxxxxxxxxxx> wrote: >>>> >>>> On 12/13/2024 10:40 PM, Antonino Maniscalco wrote: >>>>> On 12/13/24 5:50 PM, Akhil P Oommen wrote: >>>>>> On 12/12/2024 9:44 PM, Antonino Maniscalco wrote: >>>>>>> On 12/12/24 4:58 PM, Akhil P Oommen wrote: >>>>>>>> On 12/5/2024 10:24 PM, Rob Clark wrote: >>>>>>>>> From: Rob Clark <robdclark@xxxxxxxxxxxx> >>>>>>>>> >>>>>>>>> Performance counter usage falls into two categories: >>>>>>>>> >>>>>>>>> 1. Local usage, where the counter configuration, start, and end read >>>>>>>>> happen within (locally to) a single SUBMIT. In this case, >>>>>>>>> there is >>>>>>>>> no dependency on counter configuration or values between submits, >>>>>>>>> and >>>>>>>>> in fact counters are normally cleared on context switches, >>>>>>>>> making it >>>>>>>>> impossible to rely on cross-submit state. >>>>>>>>> >>>>>>>>> 2. Global usage, where a single privilaged daemon/process is sampling >>>>>>>>> counter values across all processes for profiling. >>>>>>>>> >>>>>>>>> The two categories are mutually exclusive. While you can have many >>>>>>>>> processes making local counter usage, you cannot combine global and >>>>>>>>> local usage without the two stepping on each others feet (by changing >>>>>>>>> counter configuration). >>>>>>>>> >>>>>>>>> For global counter usage, there is already a SYSPROF param (since >>>>>>>>> global >>>>>>>>> counter usage requires disabling counter clearing on context switch). >>>>>>>>> This patch adds a REQ_CNTRS param to request local counter usage. If >>>>>>>>> one or more processes has requested counter usage, then a SYSPROF >>>>>>>>> request will fail with -EBUSY. And if SYSPROF is active, then >>>>>>>>> REQ_CNTRS >>>>>>>>> will fail with -EBUSY, maintaining the mutual exclusivity. >>>>>>>>> >>>>>>>>> This is purely an advisory interface to help coordinate userspace. >>>>>>>>> There is no real means of enforcement, but the worst that can >>>>>>>>> happen if >>>>>>>>> userspace ignores a REQ_CNTRS failure is that you'll get nonsense >>>>>>>>> profiling data. >>>>>>>>> >>>>>>>>> Signed-off-by: Rob Clark <robdclark@xxxxxxxxxxxx> >>>>>>>>> --- >>>>>>>>> kgsl takes a different approach, which involves a lot more UABI for >>>>>>>>> assigning counters to different processes. But I think by taking >>>>>>>>> advantage of the fact that mesa (freedreno+turnip) reconfigure the >>>>>>>>> counters they need in each SUBMIT, for their respective gl/vk perf- >>>>>>>>> counter extensions, we can take this simpler approach. >>>>>>>> >>>>>>>> KGSL's approach is preemption and ifpc safe (also whatever HW changes >>>>>>>> that will come up in future generations). How will we ensure that here? >>>>>>>> >>>>>>>> I have plans to bring up IFPC support in near future. Also, I >>>>>>>> brought up >>>>>>>> this point during preemption series. But from the responses, I felt >>>>>>>> that >>>>>>>> profiling was not considered a serious usecase. Still I wonder how the >>>>>>>> perfcounter extensions work accurately with preemption. >>>>>>> >>>>>>> So back then I implemented the postamble IB to clear perf counters and >>>>>>> that gets disabled when sysprof (so global usage) is happening. The >>>>>>> kernel is oblivious to "Local isage" of profiling but in that case >>>>>>> really what we want to do is disable preemption which in my >>>>>>> understanding can be done from userspace with a PKT. In my understanding >>>>>>> this had us covered for all usecases. >>>>>> >>>>>> I think this wasn't mentioned at that time. Which UMD PKT in a6x+ did >>>>>> you mean? >>>>> >>>>> Ah, I thought it wasmentioned, sorry. >>>>> The packet I was referring to is: >>>>> <doc> Make next dword 1 to disable preemption, 0 to re-enable it. </ >>>>> doc> >>>>> <value name="CP_PREEMPT_DISABLE" value="0x6c" variants="A6XX"/> >>>> >>>> Ah! Okay. I think this packet is not used by the downstream blob. IMO, >>>> disabling preemption is still a suboptimal solution. >>> >>> Downstream doesn't expose userspace perfcounters (i.e. >>> VK_KHR_performance_query) at all. My understanding is that Android >>> requires you not to expose them for security reasons, but I could be >>> wrong there. In any case, Qualcomm clearly hasn't really thought >>> through what it would take to make everything work well with userspace >>> perfcounters and hasn't implemented the necessary firmware bits for >>> that, so we're left muddling through and doing what we can. >>> >> >> Honestly, I don't know what you meant by the missing support. > > GMU support to save/restore SEL regs on IFPC when SYSPROF is enabled ;-) > That won't solve the case of reading counters via devmem. That still would require disabling ifpc. > If we had that, we wouldn't need ioclts to assign+configure counters, > everything else could be done in userspace (modulo trying to do both > local and global profiling at the same time.. but I'm not convinced > that is a valid use-case, as I mentioned earlier) > Lets ignore this use-case then. We can revisit if it becomes a thing in upstream. -Akhil. > BR, > -R > >> -Akhil. >> >>> Connor >>> >>>> >>>>> >>>>> BTW you mentioned wanting to look into IFPC. Since I too wanted to look >>>>> into implementing it wonder if you could let me know when you planned on >>>>> working on it. >>>> >>>> I have few patches in progress. Nothing final yet and need verification >>>> on the hw side. Also, I need to do some housekeeping here to debug gmu >>>> issues since IFPC increases the probability of those a lot. >>>> >>>> I will try to send out the patches very soon. >>>> >>>> -Akhil. >>>> >>>>> >>>>>> >>>>>> -Akhil. >>>>>> >>>>>>> >>>>>>> So what would you expect instead we should do kernel side to make >>>>>>> profiling preemption safe? >>>>>>> >>>>>>>> >>>>>>>> -Akhil >>>>>>>> >>>>>>>>> >>>>>>>>> drivers/gpu/drm/msm/adreno/adreno_gpu.c | 2 + >>>>>>>>> drivers/gpu/drm/msm/msm_drv.c | 5 ++- >>>>>>>>> drivers/gpu/drm/msm/msm_gpu.c | 1 + >>>>>>>>> drivers/gpu/drm/msm/msm_gpu.h | 29 +++++++++++++- >>>>>>>>> drivers/gpu/drm/msm/msm_submitqueue.c | 52 +++++++++++++++++++ >>>>>>>>> +++++- >>>>>>>>> include/uapi/drm/msm_drm.h | 1 + >>>>>>>>> 6 files changed, 85 insertions(+), 5 deletions(-) >>>>>>>>> >>>>>>>>> diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c b/drivers/gpu/ >>>>>>>>> drm/msm/adreno/adreno_gpu.c >>>>>>>>> index 31bbf2c83de4..f688e37059b8 100644 >>>>>>>>> --- a/drivers/gpu/drm/msm/adreno/adreno_gpu.c >>>>>>>>> +++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.c >>>>>>>>> @@ -441,6 +441,8 @@ int adreno_set_param(struct msm_gpu *gpu, struct >>>>>>>>> msm_file_private *ctx, >>>>>>>>> if (!capable(CAP_SYS_ADMIN)) >>>>>>>>> return UERR(EPERM, drm, "invalid permissions"); >>>>>>>>> return msm_file_private_set_sysprof(ctx, gpu, value); >>>>>>>>> + case MSM_PARAM_REQ_CNTRS: >>>>>>>>> + return msm_file_private_request_counters(ctx, gpu, value); >>>>>>>>> default: >>>>>>>>> return UERR(EINVAL, drm, "%s: invalid param: %u", gpu- >>>>>>>>>> name, param); >>>>>>>>> } >>>>>>>>> diff --git a/drivers/gpu/drm/msm/msm_drv.c b/drivers/gpu/drm/msm/ >>>>>>>>> msm_drv.c >>>>>>>>> index 6416d2cb4efc..bf8314ff4a25 100644 >>>>>>>>> --- a/drivers/gpu/drm/msm/msm_drv.c >>>>>>>>> +++ b/drivers/gpu/drm/msm/msm_drv.c >>>>>>>>> @@ -377,9 +377,12 @@ static void msm_postclose(struct drm_device >>>>>>>>> *dev, struct drm_file *file) >>>>>>>>> * It is not possible to set sysprof param to non-zero if gpu >>>>>>>>> * is not initialized: >>>>>>>>> */ >>>>>>>>> - if (priv->gpu) >>>>>>>>> + if (ctx->sysprof) >>>>>>>>> msm_file_private_set_sysprof(ctx, priv->gpu, 0); >>>>>>>>> + if (ctx->counters_requested) >>>>>>>>> + msm_file_private_request_counters(ctx, priv->gpu, 0); >>>>>>>>> + >>>>>>>>> context_close(ctx); >>>>>>>>> } >>>>>>>>> diff --git a/drivers/gpu/drm/msm/msm_gpu.c b/drivers/gpu/drm/msm/ >>>>>>>>> msm_gpu.c >>>>>>>>> index 82f204f3bb8f..013b59ca3bb1 100644 >>>>>>>>> --- a/drivers/gpu/drm/msm/msm_gpu.c >>>>>>>>> +++ b/drivers/gpu/drm/msm/msm_gpu.c >>>>>>>>> @@ -991,6 +991,7 @@ int msm_gpu_init(struct drm_device *drm, struct >>>>>>>>> platform_device *pdev, >>>>>>>>> gpu->nr_rings = nr_rings; >>>>>>>>> refcount_set(&gpu->sysprof_active, 1); >>>>>>>>> + refcount_set(&gpu->local_counters_active, 1); >>>>>>>>> return 0; >>>>>>>>> diff --git a/drivers/gpu/drm/msm/msm_gpu.h b/drivers/gpu/drm/msm/ >>>>>>>>> msm_gpu.h >>>>>>>>> index e25009150579..83c61e523b1b 100644 >>>>>>>>> --- a/drivers/gpu/drm/msm/msm_gpu.h >>>>>>>>> +++ b/drivers/gpu/drm/msm/msm_gpu.h >>>>>>>>> @@ -195,12 +195,28 @@ struct msm_gpu { >>>>>>>>> int nr_rings; >>>>>>>>> /** >>>>>>>>> - * sysprof_active: >>>>>>>>> + * @sysprof_active: >>>>>>>>> * >>>>>>>>> - * The count of contexts that have enabled system profiling. >>>>>>>>> + * The count of contexts that have enabled system profiling plus >>>>>>>>> one. >>>>>>>>> + * >>>>>>>>> + * Note: refcount_t does not like 0->1 transitions.. we want to >>>>>>>>> keep >>>>>>>>> + * the under/overflow checks that refcount_t provides, but allow >>>>>>>>> + * multiple on/off transitions so we track the logical value >>>>>>>>> plus one.) >>>>>>>>> */ >>>>>>>>> refcount_t sysprof_active; >>>>>>>>> + /** >>>>>>>>> + * @local_counters_active: >>>>>>>>> + * >>>>>>>>> + * The count of contexts that have requested local (intra-submit) >>>>>>>>> + * performance counter usage plus one. >>>>>>>>> + * >>>>>>>>> + * Note: refcount_t does not like 0->1 transitions.. we want to >>>>>>>>> keep >>>>>>>>> + * the under/overflow checks that refcount_t provides, but allow >>>>>>>>> + * multiple on/off transitions so we track the logical value >>>>>>>>> plus one.) >>>>>>>>> + */ >>>>>>>>> + refcount_t local_counters_active; >>>>>>>>> + >>>>>>>>> /** >>>>>>>>> * lock: >>>>>>>>> * >>>>>>>>> @@ -383,6 +399,13 @@ struct msm_file_private { >>>>>>>>> */ >>>>>>>>> int sysprof; >>>>>>>>> + /** >>>>>>>>> + * @counters_requested: >>>>>>>>> + * >>>>>>>>> + * Has the context requested local perfcntr usage. >>>>>>>>> + */ >>>>>>>>> + bool counters_requested; >>>>>>>>> + >>>>>>>>> /** >>>>>>>>> * comm: Overridden task comm, see MSM_PARAM_COMM >>>>>>>>> * >>>>>>>>> @@ -626,6 +649,8 @@ void msm_submitqueue_destroy(struct kref *kref); >>>>>>>>> int msm_file_private_set_sysprof(struct msm_file_private *ctx, >>>>>>>>> struct msm_gpu *gpu, int sysprof); >>>>>>>>> +int msm_file_private_request_counters(struct msm_file_private *ctx, >>>>>>>>> + struct msm_gpu *gpu, int reqcntrs); >>>>>>>>> void __msm_file_private_destroy(struct kref *kref); >>>>>>>>> static inline void msm_file_private_put(struct msm_file_private >>>>>>>>> *ctx) >>>>>>>>> diff --git a/drivers/gpu/drm/msm/msm_submitqueue.c b/drivers/gpu/drm/ >>>>>>>>> msm/msm_submitqueue.c >>>>>>>>> index 7fed1de63b5d..1e1e21e6f7ae 100644 >>>>>>>>> --- a/drivers/gpu/drm/msm/msm_submitqueue.c >>>>>>>>> +++ b/drivers/gpu/drm/msm/msm_submitqueue.c >>>>>>>>> @@ -10,6 +10,15 @@ >>>>>>>>> int msm_file_private_set_sysprof(struct msm_file_private *ctx, >>>>>>>>> struct msm_gpu *gpu, int sysprof) >>>>>>>>> { >>>>>>>>> + int ret = 0; >>>>>>>>> + >>>>>>>>> + mutex_lock(&gpu->lock); >>>>>>>>> + >>>>>>>>> + if (sysprof && (refcount_read(&gpu->local_counters_active) > >>>>>>>>> 1)) { >>>>>>>>> + ret = UERR(EBUSY, gpu->dev, "Local counter usage active"); >>>>>>>>> + goto out_unlock; >>>>>>>>> + } >>>>>>>>> + >>>>>>>>> /* >>>>>>>>> * Since pm_runtime and sysprof_active are both refcounts, we >>>>>>>>> * call apply the new value first, and then unwind the previous >>>>>>>>> @@ -18,7 +27,8 @@ int msm_file_private_set_sysprof(struct >>>>>>>>> msm_file_private *ctx, >>>>>>>>> switch (sysprof) { >>>>>>>>> default: >>>>>>>>> - return UERR(EINVAL, gpu->dev, "Invalid sysprof: %d", >>>>>>>>> sysprof); >>>>>>>>> + ret = UERR(EINVAL, gpu->dev, "Invalid sysprof: %d", sysprof); >>>>>>>>> + goto out_unlock; >>>>>>>>> case 2: >>>>>>>>> pm_runtime_get_sync(&gpu->pdev->dev); >>>>>>>>> fallthrough; >>>>>>>>> @@ -43,7 +53,45 @@ int msm_file_private_set_sysprof(struct >>>>>>>>> msm_file_private *ctx, >>>>>>>>> ctx->sysprof = sysprof; >>>>>>>>> - return 0; >>>>>>>>> +out_unlock: >>>>>>>>> + mutex_unlock(&gpu->lock); >>>>>>>>> + >>>>>>>>> + return ret; >>>>>>>>> +} >>>>>>>>> + >>>>>>>>> +int msm_file_private_request_counters(struct msm_file_private *ctx, >>>>>>>>> + struct msm_gpu *gpu, int reqctrs) >>>>>>>>> +{ >>>>>>>>> + int ret = 0; >>>>>>>>> + >>>>>>>>> + mutex_lock(&gpu->lock); >>>>>>>>> + >>>>>>>>> + if (reqctrs && (refcount_read(&gpu->sysprof_active) > 1)) { >>>>>>>>> + ret = UERR(EBUSY, gpu->dev, "System profiling active"); >>>>>>>>> + goto out_unlock; >>>>>>>>> + } >>>>>>>>> + >>>>>>>>> + if (reqctrs) { >>>>>>>>> + if (ctx->counters_requested) { >>>>>>>>> + ret = UERR(EINVAL, gpu->dev, "Already requested"); >>>>>>>>> + goto out_unlock; >>>>>>>>> + } >>>>>>>>> + >>>>>>>>> + ctx->counters_requested = true; >>>>>>>>> + refcount_inc(&gpu->local_counters_active); >>>>>>>>> + } else { >>>>>>>>> + if (!ctx->counters_requested) { >>>>>>>>> + ret = UERR(EINVAL, gpu->dev, "Not requested"); >>>>>>>>> + goto out_unlock; >>>>>>>>> + } >>>>>>>>> + refcount_dec(&gpu->local_counters_active); >>>>>>>>> + ctx->counters_requested = false; >>>>>>>>> + } >>>>>>>>> + >>>>>>>>> +out_unlock: >>>>>>>>> + mutex_unlock(&gpu->lock); >>>>>>>>> + >>>>>>>>> + return ret; >>>>>>>>> } >>>>>>>>> void __msm_file_private_destroy(struct kref *kref) >>>>>>>>> diff --git a/include/uapi/drm/msm_drm.h b/include/uapi/drm/msm_drm.h >>>>>>>>> index 2342cb90857e..ae7fb355e4a1 100644 >>>>>>>>> --- a/include/uapi/drm/msm_drm.h >>>>>>>>> +++ b/include/uapi/drm/msm_drm.h >>>>>>>>> @@ -91,6 +91,7 @@ struct drm_msm_timespec { >>>>>>>>> #define MSM_PARAM_UBWC_SWIZZLE 0x12 /* RO */ >>>>>>>>> #define MSM_PARAM_MACROTILE_MODE 0x13 /* RO */ >>>>>>>>> #define MSM_PARAM_UCHE_TRAP_BASE 0x14 /* RO */ >>>>>>>>> +#define MSM_PARAM_REQ_CNTRS 0x15 /* WO: request "local" (intra- >>>>>>>>> submit) perfcntr usage */ >>>>>>>>> /* For backwards compat. The original support for preemption was >>>>>>>>> based on >>>>>>>>> * a single ring per priority level so # of priority levels equals >>>>>>>>> the # >>>>>>>> >>>>>>> >>>>>>> >>>>>>> Best regards, >>>>>> >>>>> >>>>> >>>>> Best regards, >>>> >>