Re: [RFC] drm/msm: Add UABI to request perfcntr usage

Akhil P Oommen <quic_akhilpo@xxxxxxxxxxx> · Fri, 13 Dec 2024 22:20:03 +0530

On 12/12/2024 9:44 PM, Antonino Maniscalco wrote:
> On 12/12/24 4:58 PM, Akhil P Oommen wrote:
>> On 12/5/2024 10:24 PM, Rob Clark wrote:
>>> From: Rob Clark <robdclark@xxxxxxxxxxxx>
>>>
>>> Performance counter usage falls into two categories:
>>>
>>> 1. Local usage, where the counter configuration, start, and end read
>>>     happen within (locally to) a single SUBMIT.  In this case, there is
>>>     no dependency on counter configuration or values between submits,
>>> and
>>>     in fact counters are normally cleared on context switches, making it
>>>     impossible to rely on cross-submit state.
>>>
>>> 2. Global usage, where a single privilaged daemon/process is sampling
>>>     counter values across all processes for profiling.
>>>
>>> The two categories are mutually exclusive.  While you can have many
>>> processes making local counter usage, you cannot combine global and
>>> local usage without the two stepping on each others feet (by changing
>>> counter configuration).
>>>
>>> For global counter usage, there is already a SYSPROF param (since global
>>> counter usage requires disabling counter clearing on context switch).
>>> This patch adds a REQ_CNTRS param to request local counter usage.  If
>>> one or more processes has requested counter usage, then a SYSPROF
>>> request will fail with -EBUSY.  And if SYSPROF is active, then REQ_CNTRS
>>> will fail with -EBUSY, maintaining the mutual exclusivity.
>>>
>>> This is purely an advisory interface to help coordinate userspace.
>>> There is no real means of enforcement, but the worst that can happen if
>>> userspace ignores a REQ_CNTRS failure is that you'll get nonsense
>>> profiling data.
>>>
>>> Signed-off-by: Rob Clark <robdclark@xxxxxxxxxxxx>
>>> ---
>>> kgsl takes a different approach, which involves a lot more UABI for
>>> assigning counters to different processes.  But I think by taking
>>> advantage of the fact that mesa (freedreno+turnip) reconfigure the
>>> counters they need in each SUBMIT, for their respective gl/vk perf-
>>> counter extensions, we can take this simpler approach.
>>
>> KGSL's approach is preemption and ifpc safe (also whatever HW changes
>> that will come up in future generations). How will we ensure that here?
>>
>> I have plans to bring up IFPC support in near future. Also, I brought up
>> this point during preemption series. But from the responses, I felt that
>> profiling was not considered a serious usecase. Still I wonder how the
>> perfcounter extensions work accurately with preemption.
> 
> So back then I implemented the postamble IB to clear perf counters and
> that gets disabled when sysprof (so global usage) is happening. The
> kernel is oblivious to "Local isage" of profiling but in that case
> really what we want to do is disable preemption which in my
> understanding can be done from userspace with a PKT. In my understanding
> this had us covered for all usecases.

I think this wasn't mentioned at that time. Which UMD PKT in a6x+ did
you mean?

-Akhil.

> 
> So what would you expect instead we should do kernel side to make
> profiling preemption safe?
> 
>>
>> -Akhil
>>
>>>
>>>   drivers/gpu/drm/msm/adreno/adreno_gpu.c |  2 +
>>>   drivers/gpu/drm/msm/msm_drv.c           |  5 ++-
>>>   drivers/gpu/drm/msm/msm_gpu.c           |  1 +
>>>   drivers/gpu/drm/msm/msm_gpu.h           | 29 +++++++++++++-
>>>   drivers/gpu/drm/msm/msm_submitqueue.c   | 52 ++++++++++++++++++++++++-
>>>   include/uapi/drm/msm_drm.h              |  1 +
>>>   6 files changed, 85 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c b/drivers/gpu/
>>> drm/msm/adreno/adreno_gpu.c
>>> index 31bbf2c83de4..f688e37059b8 100644
>>> --- a/drivers/gpu/drm/msm/adreno/adreno_gpu.c
>>> +++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
>>> @@ -441,6 +441,8 @@ int adreno_set_param(struct msm_gpu *gpu, struct
>>> msm_file_private *ctx,
>>>           if (!capable(CAP_SYS_ADMIN))
>>>               return UERR(EPERM, drm, "invalid permissions");
>>>           return msm_file_private_set_sysprof(ctx, gpu, value);
>>> +    case MSM_PARAM_REQ_CNTRS:
>>> +        return msm_file_private_request_counters(ctx, gpu, value);
>>>       default:
>>>           return UERR(EINVAL, drm, "%s: invalid param: %u", gpu-
>>> >name, param);
>>>       }
>>> diff --git a/drivers/gpu/drm/msm/msm_drv.c b/drivers/gpu/drm/msm/
>>> msm_drv.c
>>> index 6416d2cb4efc..bf8314ff4a25 100644
>>> --- a/drivers/gpu/drm/msm/msm_drv.c
>>> +++ b/drivers/gpu/drm/msm/msm_drv.c
>>> @@ -377,9 +377,12 @@ static void msm_postclose(struct drm_device
>>> *dev, struct drm_file *file)
>>>        * It is not possible to set sysprof param to non-zero if gpu
>>>        * is not initialized:
>>>        */
>>> -    if (priv->gpu)
>>> +    if (ctx->sysprof)
>>>           msm_file_private_set_sysprof(ctx, priv->gpu, 0);
>>>   +    if (ctx->counters_requested)
>>> +        msm_file_private_request_counters(ctx, priv->gpu, 0);
>>> +
>>>       context_close(ctx);
>>>   }
>>>   diff --git a/drivers/gpu/drm/msm/msm_gpu.c b/drivers/gpu/drm/msm/
>>> msm_gpu.c
>>> index 82f204f3bb8f..013b59ca3bb1 100644
>>> --- a/drivers/gpu/drm/msm/msm_gpu.c
>>> +++ b/drivers/gpu/drm/msm/msm_gpu.c
>>> @@ -991,6 +991,7 @@ int msm_gpu_init(struct drm_device *drm, struct
>>> platform_device *pdev,
>>>       gpu->nr_rings = nr_rings;
>>>         refcount_set(&gpu->sysprof_active, 1);
>>> +    refcount_set(&gpu->local_counters_active, 1);
>>>         return 0;
>>>   diff --git a/drivers/gpu/drm/msm/msm_gpu.h b/drivers/gpu/drm/msm/
>>> msm_gpu.h
>>> index e25009150579..83c61e523b1b 100644
>>> --- a/drivers/gpu/drm/msm/msm_gpu.h
>>> +++ b/drivers/gpu/drm/msm/msm_gpu.h
>>> @@ -195,12 +195,28 @@ struct msm_gpu {
>>>       int nr_rings;
>>>         /**
>>> -     * sysprof_active:
>>> +     * @sysprof_active:
>>>        *
>>> -     * The count of contexts that have enabled system profiling.
>>> +     * The count of contexts that have enabled system profiling plus
>>> one.
>>> +     *
>>> +     * Note: refcount_t does not like 0->1 transitions.. we want to
>>> keep
>>> +     * the under/overflow checks that refcount_t provides, but allow
>>> +     * multiple on/off transitions so we track the logical value
>>> plus one.)
>>>        */
>>>       refcount_t sysprof_active;
>>>   +    /**
>>> +     * @local_counters_active:
>>> +     *
>>> +     * The count of contexts that have requested local (intra-submit)
>>> +     * performance counter usage plus one.
>>> +     *
>>> +     * Note: refcount_t does not like 0->1 transitions.. we want to
>>> keep
>>> +     * the under/overflow checks that refcount_t provides, but allow
>>> +     * multiple on/off transitions so we track the logical value
>>> plus one.)
>>> +     */
>>> +    refcount_t local_counters_active;
>>> +
>>>       /**
>>>        * lock:
>>>        *
>>> @@ -383,6 +399,13 @@ struct msm_file_private {
>>>        */
>>>       int sysprof;
>>>   +    /**
>>> +     * @counters_requested:
>>> +     *
>>> +     * Has the context requested local perfcntr usage.
>>> +     */
>>> +    bool counters_requested;
>>> +
>>>       /**
>>>        * comm: Overridden task comm, see MSM_PARAM_COMM
>>>        *
>>> @@ -626,6 +649,8 @@ void msm_submitqueue_destroy(struct kref *kref);
>>>     int msm_file_private_set_sysprof(struct msm_file_private *ctx,
>>>                    struct msm_gpu *gpu, int sysprof);
>>> +int msm_file_private_request_counters(struct msm_file_private *ctx,
>>> +                      struct msm_gpu *gpu, int reqcntrs);
>>>   void __msm_file_private_destroy(struct kref *kref);
>>>     static inline void msm_file_private_put(struct msm_file_private
>>> *ctx)
>>> diff --git a/drivers/gpu/drm/msm/msm_submitqueue.c b/drivers/gpu/drm/
>>> msm/msm_submitqueue.c
>>> index 7fed1de63b5d..1e1e21e6f7ae 100644
>>> --- a/drivers/gpu/drm/msm/msm_submitqueue.c
>>> +++ b/drivers/gpu/drm/msm/msm_submitqueue.c
>>> @@ -10,6 +10,15 @@
>>>   int msm_file_private_set_sysprof(struct msm_file_private *ctx,
>>>                    struct msm_gpu *gpu, int sysprof)
>>>   {
>>> +    int ret = 0;
>>> +
>>> +    mutex_lock(&gpu->lock);
>>> +
>>> +    if (sysprof && (refcount_read(&gpu->local_counters_active) > 1)) {
>>> +        ret = UERR(EBUSY, gpu->dev, "Local counter usage active");
>>> +        goto out_unlock;
>>> +    }
>>> +
>>>       /*
>>>        * Since pm_runtime and sysprof_active are both refcounts, we
>>>        * call apply the new value first, and then unwind the previous
>>> @@ -18,7 +27,8 @@ int msm_file_private_set_sysprof(struct
>>> msm_file_private *ctx,
>>>         switch (sysprof) {
>>>       default:
>>> -        return UERR(EINVAL, gpu->dev, "Invalid sysprof: %d", sysprof);
>>> +        ret = UERR(EINVAL, gpu->dev, "Invalid sysprof: %d", sysprof);
>>> +        goto out_unlock;
>>>       case 2:
>>>           pm_runtime_get_sync(&gpu->pdev->dev);
>>>           fallthrough;
>>> @@ -43,7 +53,45 @@ int msm_file_private_set_sysprof(struct
>>> msm_file_private *ctx,
>>>         ctx->sysprof = sysprof;
>>>   -    return 0;
>>> +out_unlock:
>>> +    mutex_unlock(&gpu->lock);
>>> +
>>> +    return ret;
>>> +}
>>> +
>>> +int msm_file_private_request_counters(struct msm_file_private *ctx,
>>> +                      struct msm_gpu *gpu, int reqctrs)
>>> +{
>>> +    int ret = 0;
>>> +
>>> +    mutex_lock(&gpu->lock);
>>> +
>>> +    if (reqctrs && (refcount_read(&gpu->sysprof_active) > 1)) {
>>> +        ret = UERR(EBUSY, gpu->dev, "System profiling active");
>>> +        goto out_unlock;
>>> +    }
>>> +
>>> +    if (reqctrs) {
>>> +        if (ctx->counters_requested) {
>>> +            ret = UERR(EINVAL, gpu->dev, "Already requested");
>>> +            goto out_unlock;
>>> +        }
>>> +
>>> +        ctx->counters_requested = true;
>>> +        refcount_inc(&gpu->local_counters_active);
>>> +    } else {
>>> +        if (!ctx->counters_requested) {
>>> +            ret = UERR(EINVAL, gpu->dev, "Not requested");
>>> +            goto out_unlock;
>>> +        }
>>> +        refcount_dec(&gpu->local_counters_active);
>>> +        ctx->counters_requested = false;
>>> +    }
>>> +
>>> +out_unlock:
>>> +    mutex_unlock(&gpu->lock);
>>> +
>>> +    return ret;
>>>   }
>>>     void __msm_file_private_destroy(struct kref *kref)
>>> diff --git a/include/uapi/drm/msm_drm.h b/include/uapi/drm/msm_drm.h
>>> index 2342cb90857e..ae7fb355e4a1 100644
>>> --- a/include/uapi/drm/msm_drm.h
>>> +++ b/include/uapi/drm/msm_drm.h
>>> @@ -91,6 +91,7 @@ struct drm_msm_timespec {
>>>   #define MSM_PARAM_UBWC_SWIZZLE 0x12 /* RO */
>>>   #define MSM_PARAM_MACROTILE_MODE 0x13 /* RO */
>>>   #define MSM_PARAM_UCHE_TRAP_BASE 0x14 /* RO */
>>> +#define MSM_PARAM_REQ_CNTRS  0x15 /* WO: request "local" (intra-
>>> submit) perfcntr usage  */
>>>     /* For backwards compat.  The original support for preemption was
>>> based on
>>>    * a single ring per priority level so # of priority levels equals
>>> the #
>>
> 
> 
> Best regards,