On 18/10/2022 23:15, Vinay Belgaumkar wrote:
Waitboost (when SLPC is enabled) results in a H2G message. This can result
in thousands of messages during a stress test and fill up an already full
CTB. There is no need to request for RP0 if GuC is already requesting the
same.
Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@xxxxxxxxx>
---
drivers/gpu/drm/i915/gt/intel_rps.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/i915/gt/intel_rps.c b/drivers/gpu/drm/i915/gt/intel_rps.c
index fc23c562d9b2..a20ae4fceac8 100644
--- a/drivers/gpu/drm/i915/gt/intel_rps.c
+++ b/drivers/gpu/drm/i915/gt/intel_rps.c
@@ -1005,13 +1005,20 @@ void intel_rps_dec_waiters(struct intel_rps *rps)
void intel_rps_boost(struct i915_request *rq)
{
struct intel_guc_slpc *slpc;
+ struct intel_rps *rps = &READ_ONCE(rq->engine)->gt->rps;
if (i915_request_signaled(rq) || i915_request_has_waitboost(rq))
return;
+ /* If GuC is already requesting RP0, skip */
+ if (rps_uses_slpc(rps)) {
+ slpc = rps_to_slpc(rps);
+ if (intel_rps_get_requested_frequency(rps) == slpc->rp0_freq)
+ return;
+ }
+
Feels a little bit like a layering violation. Wait boost reference
counts and request markings will changed based on asynchronous state - a
mmio read.
Also, a little below we have this:
"""
/* Serializes with i915_request_retire() */
if (!test_and_set_bit(I915_FENCE_FLAG_BOOST, &rq->fence.flags)) {
struct intel_rps *rps = &READ_ONCE(rq->engine)->gt->rps;
if (rps_uses_slpc(rps)) {
slpc = rps_to_slpc(rps);
/* Return if old value is non zero */
if (!atomic_fetch_inc(&slpc->num_waiters))
***>>>> Wouldn't it skip doing anything here already? <<<<***
schedule_work(&slpc->boost_work);
return;
}
if (atomic_fetch_inc(&rps->num_waiters))
return;
"""
But I wonder if this is not a layering violation already. Looks like one
for me at the moment. And as it happens there is an ongoing debug of
clvk slowness where I was a bit puzzled by the lack of "boost fence" in
trace_printk logs - but now I see how that happens. Does not feel right
to me that we lose that tracing with SLPC.
So in general - why the correct approach wouldn't be to solve this in
the worker - which perhaps should fork to slpc specific branch and do
the consolidations/skips based on mmio reads in there?
Regards,
Tvrtko
/* Serializes with i915_request_retire() */
if (!test_and_set_bit(I915_FENCE_FLAG_BOOST, &rq->fence.flags)) {
- struct intel_rps *rps = &READ_ONCE(rq->engine)->gt->rps;
if (rps_uses_slpc(rps)) {
slpc = rps_to_slpc(rps);