> -----Original Message----- > From: Zeng, Oak <oak.zeng@xxxxxxxxx> > Sent: Friday, January 10, 2025 4:30 AM > To: Belgaumkar, Vinay <vinay.belgaumkar@xxxxxxxxx>; Upadhyay, Tejas > <tejas.upadhyay@xxxxxxxxx>; intel-xe@xxxxxxxxxxxxxxxxxxxxx > Cc: dri-devel@xxxxxxxxxxxxxxxxxxxxx; Nilawar, Badal > <badal.nilawar@xxxxxxxxx>; Mrozek, Michal <michal.mrozek@xxxxxxxxx>; > Morek, Szymon <szymon.morek@xxxxxxxxx>; Souza, Jose > <jose.souza@xxxxxxxxx>; De Marchi, Lucas <lucas.demarchi@xxxxxxxxx> > Subject: RE: [RFC PATCH V2] drm/xe/guc: Use exec queue hints for GT > frequency > > > > > -----Original Message----- > > From: Belgaumkar, Vinay <vinay.belgaumkar@xxxxxxxxx> > > Sent: January 9, 2025 5:03 PM > > To: Zeng, Oak <oak.zeng@xxxxxxxxx>; Upadhyay, Tejas > > <tejas.upadhyay@xxxxxxxxx>; intel-xe@xxxxxxxxxxxxxxxxxxxxx > > Cc: dri-devel@xxxxxxxxxxxxxxxxxxxxx; Nilawar, Badal > > <badal.nilawar@xxxxxxxxx>; Mrozek, Michal <michal.mrozek@xxxxxxxxx>; > > Morek, Szymon <szymon.morek@xxxxxxxxx>; Souza, Jose > > <jose.souza@xxxxxxxxx>; De Marchi, Lucas <lucas.demarchi@xxxxxxxxx> > > Subject: Re: [RFC PATCH V2] drm/xe/guc: Use exec queue hints for GT > > frequency > > > > > > On 1/9/2025 9:37 AM, Zeng, Oak wrote: > > > > > >> -----Original Message----- > > >> From: dri-devel <dri-devel-bounces@xxxxxxxxxxxxxxxxxxxxx> On Behalf > > >> Of Tejas Upadhyay > > >> Sent: January 9, 2025 7:07 AM > > >> To: intel-xe@xxxxxxxxxxxxxxxxxxxxx > > >> Cc: dri-devel@xxxxxxxxxxxxxxxxxxxxx; Nilawar, Badal > > >> <badal.nilawar@xxxxxxxxx>; Belgaumkar, Vinay > > >> <vinay.belgaumkar@xxxxxxxxx>; Mrozek, Michal > > >> <michal.mrozek@xxxxxxxxx>; Morek, Szymon > <szymon.morek@xxxxxxxxx>; > > >> Souza, Jose > > <jose.souza@xxxxxxxxx>; > > >> De Marchi, Lucas <lucas.demarchi@xxxxxxxxx>; Upadhyay, Tejas > > >> <tejas.upadhyay@xxxxxxxxx> > > >> Subject: [RFC PATCH V2] drm/xe/guc: Use exec queue hints for GT > > >> frequency > > >> > > >> Allow user to provide a low latency hint per exec queue. When set, > > >> KMD sends a hint to GuC which results in special handling for this > > >> exec queue. SLPC will ramp the GT frequency aggressively every > > time > > >> it switches to this exec queue. > > >> > > >> We need to enable the use of SLPC Compute strategy during init, > > but > > >> it will apply only to exec queues that set this bit during exec > > >> queue creation. > > >> > > >> Improvement with this approach as below: > > >> > > >> Before, > > >> > > >> :~$ NEOReadDebugKeys=1 EnableDirectSubmission=0 clpeak -- > > >> kernel-latency > > >> Platform: Intel(R) OpenCL Graphics > > >> Device: Intel(R) Graphics [0xe20b] > > >> Driver version : 24.52.0 (Linux x64) > > >> Compute units : 160 > > >> Clock frequency : 2850 MHz > > >> Kernel launch latency : 283.16 us > > >> > > >> After, > > >> > > >> :~$ NEOReadDebugKeys=1 EnableDirectSubmission=0 clpeak -- > > >> kernel-latency > > >> Platform: Intel(R) OpenCL Graphics > > >> Device: Intel(R) Graphics [0xe20b] > > >> Driver version : 24.52.0 (Linux x64) > > >> Compute units : 160 > > >> Clock frequency : 2850 MHz > > >> > > >> Kernel launch latency : 63.38 us > > >> > > >> UMD will indicate low latency hint with flag as mentioned below, > > >> > > >> * struct drm_xe_exec_queue_create exec_queue_create = { > > >> * .flags = DRM_XE_EXEC_QUEUE_LOW_LATENCY_HINT or 0 > > >> * .extensions = 0, > > >> * .vm_id = vm, > > >> * .num_bb_per_exec = 1, > > >> * .num_eng_per_bb = 1, > > >> * .instances = to_user_pointer(&instance), > > >> * }; > > >> * ioctl(fd, DRM_IOCTL_XE_EXEC_QUEUE_CREATE, > > >> &exec_queue_create); > > >> > > >> Link to UMD PR : https://github.com/intel/compute- > > runtime/pull/794 > > >> > > >> Note: There is outstanding issue on guc side to be not able to > > switch > > >> to max > > >> frequency as per strategy indicated by KMD, so for > > experminet/test > > >> result > > >> hardcoding apporch was taken and passed to guc as policy. Effort > > on > > >> debugging > > >> from guc side is going on in parallel. > > >> > > >> V2: > > >> - DRM_XE_EXEC_QUEUE_LOW_LATENCY_HINT 1 is already > > planned > > >> for other hint(Szymon) > > >> - Add motivation to description (Lucas) > > >> > > >> Cc:dri-devel@xxxxxxxxxxxxxxxxxxxxx > > >> Cc:vinay.belgaumkar@xxxxxxxxx > > >> Cc:Michal Mrozek <michal.mrozek@xxxxxxxxx> Cc:Szymon Morek > > >> <szymon.morek@xxxxxxxxx> Cc:José Roberto de Souza > > >> <jose.souza@xxxxxxxxx> > > >> Signed-off-by: Tejas Upadhyay <tejas.upadhyay@xxxxxxxxx> > > >> --- > > >> drivers/gpu/drm/xe/abi/guc_actions_slpc_abi.h | 3 +++ > > >> drivers/gpu/drm/xe/xe_exec_queue.c | 7 ++++--- > > >> drivers/gpu/drm/xe/xe_guc_pc.c | 16 > > ++++++++++++++++ > > >> drivers/gpu/drm/xe/xe_guc_submit.c | 7 +++++++ > > >> include/uapi/drm/xe_drm.h | 3 ++- > > >> 5 files changed, 32 insertions(+), 4 deletions(-) > > >> > > >> diff --git a/drivers/gpu/drm/xe/abi/guc_actions_slpc_abi.h > > >> b/drivers/gpu/drm/xe/abi/guc_actions_slpc_abi.h > > >> index 85abe4f09ae2..c50075b8270f 100644 > > >> --- a/drivers/gpu/drm/xe/abi/guc_actions_slpc_abi.h > > >> +++ b/drivers/gpu/drm/xe/abi/guc_actions_slpc_abi.h > > >> @@ -174,6 +174,9 @@ struct slpc_task_state_data { > > >> }; > > >> } __packed; > > >> > > >> +#define SLPC_EXEC_QUEUE_FREQ_REQ_IS_COMPUTE > > >> REG_BIT(28) > > >> +#define SLPC_OPTIMIZED_STRATEGY_COMPUTE > > >> REG_BIT(0) > > >> + > > >> struct slpc_shared_data_header { > > >> /* Total size in bytes of this shared buffer. */ > > >> u32 size; > > >> diff --git a/drivers/gpu/drm/xe/xe_exec_queue.c > > >> b/drivers/gpu/drm/xe/xe_exec_queue.c > > >> index 8948f50ee58f..7747ba6c4bb8 100644 > > >> --- a/drivers/gpu/drm/xe/xe_exec_queue.c > > >> +++ b/drivers/gpu/drm/xe/xe_exec_queue.c > > >> @@ -553,7 +553,8 @@ int xe_exec_queue_create_ioctl(struct > > >> drm_device *dev, void *data, > > >> u32 len; > > >> int err; > > >> > > >> - if (XE_IOCTL_DBG(xe, args->flags) || > > >> + if (XE_IOCTL_DBG(xe, args->flags && > > >> + !(args->flags & > > >> DRM_XE_EXEC_QUEUE_LOW_LATENCY_HINT)) || > > >> XE_IOCTL_DBG(xe, args->reserved[0] || args->reserved[1])) > > >> return -EINVAL; > > >> > > >> @@ -578,7 +579,7 @@ int xe_exec_queue_create_ioctl(struct > > >> drm_device *dev, void *data, > > >> > > >> for_each_tile(tile, xe, id) { > > >> struct xe_exec_queue *new; > > >> - u32 flags = EXEC_QUEUE_FLAG_VM; > > >> + u32 flags = args->flags | > > >> EXEC_QUEUE_FLAG_VM; > > > > > > You are mixing internal and external flags here. Args->flags is an > > external definition. Note the current value of > > > DRM_XE_EXEC_QUEUE_LOW_LATENCY_HINT conflict with the > > internal definition of: > > > > > > #define EXEC_QUEUE_FLAG_PERMANENT BIT(1) > > > > > > I think the better way to do it is, define an internal value for > > > this > > purpose, such as: > > > > > > #define EXEC_QUEUE_FLAG_LOW_LATENCY BIT(5) > > > > > > Then write: if (args->flags & > > DRM_XE_EXEC_QUEUE_LOW_LATENCY_HINT) > > > flags |= EXEC_QUEUE_FLAG_LOW_LATENCY; > > > > > > > > >> if (id) > > >> flags |= > > >> EXEC_QUEUE_FLAG_BIND_ENGINE_CHILD; > > >> @@ -626,7 +627,7 @@ int xe_exec_queue_create_ioctl(struct > > >> drm_device *dev, void *data, > > >> } > > >> > > >> q = xe_exec_queue_create(xe, vm, logical_mask, > > >> - args->width, hwe, 0, > > >> + args->width, hwe, args->flags, > > > Use internal flag also here if you do what I said above > > > > > > > > >> args->extensions); > > >> up_read(&vm->lock); > > >> xe_vm_put(vm); > > >> diff --git a/drivers/gpu/drm/xe/xe_guc_pc.c > > >> b/drivers/gpu/drm/xe/xe_guc_pc.c index df7f130fb663..ff0b98ccf1a7 > > >> 100644 > > >> --- a/drivers/gpu/drm/xe/xe_guc_pc.c > > >> +++ b/drivers/gpu/drm/xe/xe_guc_pc.c > > >> @@ -992,6 +992,19 @@ static int pc_init_freqs(struct xe_guc_pc > > *pc) > > >> return ret; > > >> } > > >> > > >> +static int xe_guc_pc_set_strategy(struct xe_guc_pc *pc, u32 val) { > > >> + int ret = 0; > > >> + > > >> + xe_pm_runtime_get(pc_to_xe(pc)); > > >> + ret = pc_action_set_param(pc, > > >> + SLPC_PARAM_STRATEGIES, > > >> + val); > > >> + xe_pm_runtime_put(pc_to_xe(pc)); > > >> + > > >> + return ret; > > >> +} > > >> + > > >> /** > > >> * xe_guc_pc_start - Start GuC's Power Conservation component > > >> * @pc: Xe_GuC_PC instance > > >> @@ -1052,6 +1065,9 @@ int xe_guc_pc_start(struct xe_guc_pc > > *pc) > > >> > > >> ret = pc_action_setup_gucrc(pc, > > >> GUCRC_FIRMWARE_CONTROL); > > >> > > >> + /* Enable SLPC Optimized Strategy for compute */ > > >> + xe_guc_pc_set_strategy(pc, > > >> SLPC_OPTIMIZED_STRATEGY_COMPUTE); > > >> + > > >> out: > > >> xe_force_wake_put(gt_to_fw(gt), fw_ref); > > >> return ret; > > >> diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c > > >> b/drivers/gpu/drm/xe/xe_guc_submit.c > > >> index 9c36329fe857..88a1987ac360 100644 > > >> --- a/drivers/gpu/drm/xe/xe_guc_submit.c > > >> +++ b/drivers/gpu/drm/xe/xe_guc_submit.c > > >> @@ -15,6 +15,7 @@ > > >> #include <drm/drm_managed.h> > > >> > > >> #include "abi/guc_actions_abi.h" > > >> +#include "abi/guc_actions_slpc_abi.h" > > >> #include "abi/guc_klvs_abi.h" > > >> #include "regs/xe_lrc_layout.h" > > >> #include "xe_assert.h" > > >> @@ -400,6 +401,7 @@ static void > > >> __guc_exec_queue_policy_add_##func(struct > > exec_queue_policy > > >> *policy, > > >> MAKE_EXEC_QUEUE_POLICY_ADD(execution_quantum, > > >> EXECUTION_QUANTUM) > > >> MAKE_EXEC_QUEUE_POLICY_ADD(preemption_timeout, > > >> PREEMPTION_TIMEOUT) > > >> MAKE_EXEC_QUEUE_POLICY_ADD(priority, > > SCHEDULING_PRIORITY) > > >> +MAKE_EXEC_QUEUE_POLICY_ADD(slpc_ctx_freq_req, > > >> SLPM_GT_FREQUENCY) > > >> #undef MAKE_EXEC_QUEUE_POLICY_ADD > > >> > > >> static const int xe_exec_queue_prio_to_guc[] = { @@ -414,14 > > >> +416,19 @@ static void init_policies(struct xe_guc > > *guc, > > >> struct xe_exec_queue *q) > > >> struct exec_queue_policy policy; > > >> enum xe_exec_queue_priority prio = q- > > >>> sched_props.priority; > > >> u32 timeslice_us = q->sched_props.timeslice_us; > > >> + u32 slpc_ctx_freq_req = 0; > > >> u32 preempt_timeout_us = q- > > >>> sched_props.preempt_timeout_us; > > >> xe_gt_assert(guc_to_gt(guc), exec_queue_registered(q)); > > >> > > >> + if (q->flags & DRM_XE_EXEC_QUEUE_LOW_LATENCY_HINT) > > > Use internal definition > > > > > >> + slpc_ctx_freq_req |= > > >> SLPC_EXEC_QUEUE_FREQ_REQ_IS_COMPUTE; > > > From the codes above, I feel the user hint flag better be named as > > > Something like DRM_XE_EXEC_QUEUE_COMPUTE_HINT > > > > > > I feel this is slightly better than LOW_LATENCY as low latency is a > > result > > > Of applying some frequency policy designed for compute task. > > We had this discussion while implementing it on the i915 side. It's > > better to keep it engine non-specific, that way non-compute(3D) > > contexts can use it as well. > > Ok, then COMPUTE is not the correct word. > > I still think LOW_LATENCY is not the right wording here, ie., during submission, > the result of Applying such policy is low latency, but during computing, the > result can be faster. > > A few other wording come to my mind are: HIGH_FREQUENCY, > AGGRESSIVE_FREQUENCY, INTENSIVE_FREQUENCY or > VIGOROUS_FREQUENCY > > If LOW_LATENCY was accepted at i915 time, maybe keep it. Sure Tejas > > > > > > > A related question is, does this new policy affect power > > consumption? Usually > > > Higher frequency implies higher power. > > > > > > Do we only want to keep such frequency policy during submission > > time, or > > > Same policy is intended for whole execution time? > > SLPC will switch to aggressive frequency strategies when that > > particular context is switched in. Other contexts are not affected. > > There is definitely a power impact since we are aggressively ramping > > the frequency for this context. > > So policy applied to both submission and execution for the context. > > > > > > > The answer of above question would lead to some interface design > > such as > > > Whether we need to introduce interface to disable the compute > > frequency policy. > > > > It can be disabled on the fly for any context, since this is just > > another SLPC param. However, I believe the upstream policy is to not > > mutate the context after it has been created. > > > Ok, as long as we are aware of, and people not complaining of the power > consumption change After we apply this policy, making is immutable is fine. > > Oak > > > > > Thanks, > > > > Vinay. > > > > > > > > > > >> + > > >> __guc_exec_queue_policy_start_klv(&policy, q->guc->id); > > >> __guc_exec_queue_policy_add_priority(&policy, > > >> xe_exec_queue_prio_to_guc[prio]); > > >> > > >> __guc_exec_queue_policy_add_execution_quantum(&polic > > >> y, timeslice_us); > > >> > > >> __guc_exec_queue_policy_add_preemption_timeout(&polic > > >> y, preempt_timeout_us); > > >> + __guc_exec_queue_policy_add_slpc_ctx_freq_req(&policy, > > >> slpc_ctx_freq_req); > > >> > > >> xe_guc_ct_send(&guc->ct, (u32 *)&policy.h2g, > > >> __guc_exec_queue_policy_action_size(&policy), > > >> 0, 0); > > >> diff --git a/include/uapi/drm/xe_drm.h b/include/uapi/drm/xe_drm.h > > >> index f62689ca861a..bd0150d2200c 100644 > > >> --- a/include/uapi/drm/xe_drm.h > > >> +++ b/include/uapi/drm/xe_drm.h > > >> @@ -1097,6 +1097,7 @@ struct drm_xe_vm_bind { > > >> * .engine_class = DRM_XE_ENGINE_CLASS_RENDER, > > >> * }; > > >> * struct drm_xe_exec_queue_create exec_queue_create = { > > >> + * .flags = DRM_XE_EXEC_QUEUE_LOW_LATENCY_HINT or 0 > > >> * .extensions = 0, > > >> * .vm_id = vm, > > >> * .num_bb_per_exec = 1, > > >> @@ -1110,7 +1111,6 @@ struct drm_xe_exec_queue_create { > > >> #define DRM_XE_EXEC_QUEUE_EXTENSION_SET_PROPERTY > > >> 0 > > >> #define DRM_XE_EXEC_QUEUE_SET_PROPERTY_PRIORITY > > >> 0 > > >> #define DRM_XE_EXEC_QUEUE_SET_PROPERTY_TIMESLICE > > >> 1 > > >> - > > >> /** @extensions: Pointer to the first extension struct, if any > > >> */ > > >> __u64 extensions; > > >> > > >> @@ -1123,6 +1123,7 @@ struct drm_xe_exec_queue_create { > > >> /** @vm_id: VM to use for this exec queue */ > > >> __u32 vm_id; > > >> > > >> +#define DRM_XE_EXEC_QUEUE_LOW_LATENCY_HINT (0x1 > > >> << 1) > > > I wonder why flag can't start from (0x1 << 0) here. > > > > > > Yes I did see the v2 comment of below: > > > > > > - DRM_XE_EXEC_QUEUE_LOW_LATENCY_HINT 1 is already > > planned > > > for other hint(Szymon) > > > > > > but this is regarding the definition of a flag used in > > exec_queue_create, and this > > > is the first time we use this flag in *this* uAPI. It shouldn't > > > conflict > > with UMD's usage of hint flags > > > in *other* uAPI > > > > > >> /** @flags: MBZ */ > > > Remove this MBZ comment > > > > > > Oak > > > > > > > > >> __u32 flags; > > >> > > >> -- > > >> 2.34.1