Re: [Intel-gfx] [PATCH v3] drm/i915/slpc: Optmize waitboost for SLPC

"Dixit, Ashutosh" <ashutosh.dixit@xxxxxxxxx> · Fri, 21 Oct 2022 11:40:36 -0700

On Fri, 21 Oct 2022 11:24:42 -0700, Belgaumkar, Vinay wrote:
>
>
> On 10/20/2022 4:36 PM, Dixit, Ashutosh wrote:
> > On Thu, 20 Oct 2022 13:16:00 -0700, Belgaumkar, Vinay wrote:
> >> On 10/20/2022 11:33 AM, Dixit, Ashutosh wrote:
> >>> On Wed, 19 Oct 2022 17:29:44 -0700, Vinay Belgaumkar wrote:
> >>> Hi Vinay,
> >>>
> >>>> Waitboost (when SLPC is enabled) results in a H2G message. This can result
> >>>> in thousands of messages during a stress test and fill up an already full
> >>>> CTB. There is no need to request for RP0 if GuC is already requesting the
> >>>> same.
> >>> But how are we sure that the freq will remain at RP0 in the future (when
> >>> the waiting request or any requests which are ahead execute)?
> >>>
> >>> In the current waitboost implementation, set_param is sent to GuC ahead of
> >>> the waiting request to ensure that the freq would be max when this waiting
> >>> request executed on the GPU and the freq is kept at max till this request
> >>> retires (considering just one waiting request). How can we ensure this if
> >>> we don't send the waitboost set_param to GuC?
> >> There is no way to guarantee the frequency will remain at RP0 till the
> >> request retires. As a theoretical example, lets say the request boosted
> >> freq to RP0, but a user changed min freq using sysfs immediately after.
> > That would be a bug. If waitboost is in progress and in the middle user
> > changed min freq, I would expect the freq to revert to the new min only
> > after the waitboost phase was over.
>
> The problem here is that GuC is unaware of this "boosting"
> phenomenon. Setting the min_freq_softlimit as well to boost when we send a
> boost request might help with this issue.
>
> >
> > In any case, I am not referring to this case. Since FW controls the freq
> > there is nothing preventing FW to change the freq unless we raise min to
> > max which is what waitboost does.
> Ok, so maybe the solution here is to check if min_softlimit is already at
> boost freq, as it tracks the min freq changes. That should take care of
> server parts automatically as well.

Correct, yes that would be the right way to do it.

Thanks.
--
Ashutosh

> >> Waitboost is done by a pending request to "hurry" the current requests. If
> >> GT is already at boost frequency, that purpose is served.
> > FW can bring the freq down later before the waiting request is scheduled.
> >> Also, host algorithm already has this optimization as well.
> > Host turbo is different from SLPC. Host turbo controls the freq algorithm
> > so it knows freq will not come down till it itself brings the freq
> > down. Unlike SLPC where FW is controling the freq. Therefore host turbo
> > doesn't ever need to do a MMIO read but only needs to refer to its own
> > state (rps->cur_freq etc.).
> True. Host algorithm has a periodic timer where it updates frequency. Here,
> it checks num_waiters and sets client_boost every time that is non-zero.
> >>> I had assumed we'll do this optimization for server parts where min is
> >>> already RP0 in which case we can completely disable waitboost. But this
> >>> patch is something else.
>
> Hopefully the softlimit changes above will help with client and server.
>
> Thanks,
>
> Vinay.
>
> > Thanks.
> > --
> > Ashutosh
> >
> >>>> v2: Add the tracing back, and check requested freq
> >>>> in the worker thread (Tvrtko)
> >>>> v3: Check requested freq in dec_waiters as well
> >>>>
> >>>> Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@xxxxxxxxx>
> >>>> ---
> >>>>    drivers/gpu/drm/i915/gt/intel_rps.c         |  3 +++
> >>>>    drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.c | 14 +++++++++++---
> >>>>    2 files changed, 14 insertions(+), 3 deletions(-)
> >>>>
> >>>> diff --git a/drivers/gpu/drm/i915/gt/intel_rps.c b/drivers/gpu/drm/i915/gt/intel_rps.c
> >>>> index fc23c562d9b2..18b75cf08d1b 100644
> >>>> --- a/drivers/gpu/drm/i915/gt/intel_rps.c
> >>>> +++ b/drivers/gpu/drm/i915/gt/intel_rps.c
> >>>> @@ -1016,6 +1016,9 @@ void intel_rps_boost(struct i915_request *rq)
> >>>>		if (rps_uses_slpc(rps)) {
> >>>>			slpc = rps_to_slpc(rps);
> >>>>
> >>>> +			GT_TRACE(rps_to_gt(rps), "boost fence:%llx:%llx\n",
> >>>> +				 rq->fence.context, rq->fence.seqno);
> >>>> +
> >>>>			/* Return if old value is non zero */
> >>>>			if (!atomic_fetch_inc(&slpc->num_waiters))
> >>>>				schedule_work(&slpc->boost_work);
> >>>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.c
> >>>> index b7cdeec44bd3..9dbdbab1515a 100644
> >>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.c
> >>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.c
> >>>> @@ -227,14 +227,19 @@ static int slpc_force_min_freq(struct intel_guc_slpc *slpc, u32 freq)
> >>>>    static void slpc_boost_work(struct work_struct *work)
> >>>>    {
> >>>>	struct intel_guc_slpc *slpc = container_of(work, typeof(*slpc), boost_work);
> >>>> +	struct intel_rps *rps = &slpc_to_gt(slpc)->rps;
> >>>>	int err;
> >>>>
> >>>>	/*
> >>>>	 * Raise min freq to boost. It's possible that
> >>>>	 * this is greater than current max. But it will
> >>>>	 * certainly be limited by RP0. An error setting
> >>>> -	 * the min param is not fatal.
> >>>> +	 * the min param is not fatal. No need to boost
> >>>> +	 * if we are already requesting it.
> >>>>	 */
> >>>> +	if (intel_rps_get_requested_frequency(rps) == slpc->boost_freq)
> >>>> +		return;
> >>>> +
> >>>>	mutex_lock(&slpc->lock);
> >>>>	if (atomic_read(&slpc->num_waiters)) {
> >>>>		err = slpc_force_min_freq(slpc, slpc->boost_freq);
> >>>> @@ -728,6 +733,7 @@ int intel_guc_slpc_set_boost_freq(struct intel_guc_slpc *slpc, u32 val)
> >>>>
> >>>>    void intel_guc_slpc_dec_waiters(struct intel_guc_slpc *slpc)
> >>>>    {
> >>>> +	struct intel_rps *rps = &slpc_to_gt(slpc)->rps;
> >>>>	/*
> >>>>	 * Return min back to the softlimit.
> >>>>	 * This is called during request retire,
> >>>> @@ -735,8 +741,10 @@ void intel_guc_slpc_dec_waiters(struct intel_guc_slpc *slpc)
> >>>>	 * set_param fails.
> >>>>	 */
> >>>>	mutex_lock(&slpc->lock);
> >>>> -	if (atomic_dec_and_test(&slpc->num_waiters))
> >>>> -		slpc_force_min_freq(slpc, slpc->min_freq_softlimit);
> >>>> +	if (atomic_dec_and_test(&slpc->num_waiters)) {
> >>>> +		if (intel_rps_get_requested_frequency(rps) != slpc->min_freq_softlimit)
> >>>> +			slpc_force_min_freq(slpc, slpc->min_freq_softlimit);
> >>>> +	}
> >>>>	mutex_unlock(&slpc->lock);
> >>>>    }
> >>>>
> >>>> --
> >>>> 2.35.1
> >>>>