Re: [Intel-gfx] [PATCH] drm/i915/slpc: Optmize waitboost for SLPC

Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxxxxxxxx> · Wed, 19 Oct 2022 08:40:54 +0100

On 18/10/2022 23:15, Vinay Belgaumkar wrote:
Waitboost (when SLPC is enabled) results in a H2G message. This can result
in thousands of messages during a stress test and fill up an already full
CTB. There is no need to request for RP0 if GuC is already requesting the
same.

Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@xxxxxxxxx>
---
  drivers/gpu/drm/i915/gt/intel_rps.c | 9 ++++++++-
  1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_rps.c b/drivers/gpu/drm/i915/gt/intel_rps.c
index fc23c562d9b2..a20ae4fceac8 100644
--- a/drivers/gpu/drm/i915/gt/intel_rps.c
+++ b/drivers/gpu/drm/i915/gt/intel_rps.c
@@ -1005,13 +1005,20 @@ void intel_rps_dec_waiters(struct intel_rps *rps)
  void intel_rps_boost(struct i915_request *rq)
  {
  	struct intel_guc_slpc *slpc;
+	struct intel_rps *rps = &READ_ONCE(rq->engine)->gt->rps;
  
  	if (i915_request_signaled(rq) || i915_request_has_waitboost(rq))
  		return;
  
+	/* If GuC is already requesting RP0, skip */
+	if (rps_uses_slpc(rps)) {
+		slpc = rps_to_slpc(rps);
+		if (intel_rps_get_requested_frequency(rps) == slpc->rp0_freq)
+			return;
+	}
+

Feels a little bit like a layering violation. Wait boost reference 
counts and request markings will changed based on asynchronous state - a 
mmio read.

Also, a little below we have this:

"""
	/* Serializes with i915_request_retire() */
	if (!test_and_set_bit(I915_FENCE_FLAG_BOOST, &rq->fence.flags)) {
		struct intel_rps *rps = &READ_ONCE(rq->engine)->gt->rps;

		if (rps_uses_slpc(rps)) {
			slpc = rps_to_slpc(rps);

			/* Return if old value is non zero */
			if (!atomic_fetch_inc(&slpc->num_waiters))

***>>>> Wouldn't it skip doing anything here already? <<<<***

				schedule_work(&slpc->boost_work);

			return;
		}

		if (atomic_fetch_inc(&rps->num_waiters))
			return;
"""

But I wonder if this is not a layering violation already. Looks like one 
for me at the moment. And as it happens there is an ongoing debug of 
clvk slowness where I was a bit puzzled by the lack of "boost fence" in 
trace_printk logs - but now I see how that happens. Does not feel right 
to me that we lose that tracing with SLPC.

So in general - why the correct approach wouldn't be to solve this in 
the worker - which perhaps should fork to slpc specific branch and do 
the consolidations/skips based on mmio reads in there?

Regards,

Tvrtko

  	/* Serializes with i915_request_retire() */
  	if (!test_and_set_bit(I915_FENCE_FLAG_BOOST, &rq->fence.flags)) {
-		struct intel_rps *rps = &READ_ONCE(rq->engine)->gt->rps;
  
  		if (rps_uses_slpc(rps)) {
  			slpc = rps_to_slpc(rps);