On 07/09/2018 09:26, Tvrtko Ursulin
wrote:
On 06/09/2018 11:36, Lionel Landwerlin wrote:
On 06/09/2018 11:22, Chris Wilson wrote:
Quoting Lionel Landwerlin (2018-09-06
11:18:01)
On 06/09/2018 11:10, Chris Wilson
wrote:
Quoting Lionel Landwerlin
(2018-09-06 10:57:47)
On 05/09/2018 15:22, Tvrtko
Ursulin wrote:
From: Lionel Landwerlin
<lionel.g.landwerlin@xxxxxxxxx>
If some of the contexts submitting workloads to the
GPU have been
configured to shutdown slices/subslices, we might
loose the NOA
configurations written in the NOA muxes.
One possible solution to this problem is to reprogram
the NOA muxes
when we switch to a new context. We initially tried
this in the
workaround batchbuffer but some concerns where raised
about the cost
of reprogramming at every context switch. This
solution is also not
without consequences from the userspace point of view.
Reprogramming
of the muxes can only happen once the powergating
configuration has
changed (which happens after context switch). This
means for a window
of time during the recording, counters recorded by the
OA unit might
be invalid. This requires userspace dealing with OA
reports to discard
the invalid values.
Minimizing the reprogramming could be implemented by
tracking of the
last programmed configuration somewhere in GGTT and
use MI_PREDICATE
to discard some of the programming commands, but the
command streamer
would still have to parse all the MI_LRI instructions
in the
workaround batchbuffer.
Another solution, which this change implements, is to
simply disregard
the user requested configuration for the period of
time when i915/perf
is active. There is no known issue with this apart
from a performance
penality for some media workloads that benefit from
running on a
partially powergated GPU. We already prevent RC6 from
affecting the
programming so it doesn't sound completely
unreasonable to hold on
powergating for the same reason.
v2: Leave RPCS programming in intel_lrc.c (Lionel)
v3: Update for s/union intel_sseu/struct intel_sseu/
(Lionel)
More to_intel_context() (Tvrtko)
s/dev_priv/i915/ (Tvrtko)
Tvrtko Ursulin:
v4:
* Rebase for make_rpcs changes.
v5:
* Apply OA restriction from make_rpcs directly.
v6:
* Rebase for context image setup changes.
Signed-off-by: Lionel Landwerlin
<lionel.g.landwerlin@xxxxxxxxx>
Signed-off-by: Tvrtko Ursulin
<tvrtko.ursulin@xxxxxxxxx>
---
drivers/gpu/drm/i915/i915_perf.c | 5 +++++
drivers/gpu/drm/i915/intel_lrc.c | 30
++++++++++++++++++++----------
drivers/gpu/drm/i915/intel_lrc.h | 3 +++
3 files changed, 28 insertions(+), 10 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_perf.c
b/drivers/gpu/drm/i915/i915_perf.c
index ccb20230df2c..dd65b72bddd4 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -1677,6 +1677,11 @@ static void
gen8_update_reg_state_unlocked(struct i915_gem_context
*ctx,
CTX_REG(reg_state, state_offset,
flex_regs[i], value);
}
+
+ CTX_REG(reg_state, CTX_R_PWR_CLK_STATE,
GEN8_R_PWR_CLK_STATE,
+ gen8_make_rpcs(dev_priv,
+
&to_intel_context(ctx,
+
dev_priv->engine[RCS])->sseu));
I think there is one issue I missed on the previous
iterations of this
patch.
This gen8_update_reg_state_unlocked() is called when the
GPU is parked
on the kernel context.
It's supposed to update all contexts, but I think we
might not be able
to update the kernel context image while the GPU is
using it.
The kernel context is only ever taken in extremis (you are
either
parking or stalling userspace) so I don't care.
The patch exposing the RPCS configuration to userspace will
make use of
the kernel context while OA/perf is enabled. Even if it
reprograms the
locked value that will break the power configuration
stability on Gen11
(because the locked configuration will be different from the
kernel
context configuration).
Sure, but as you point out that's only on changing
configuration.
What's missing in the patch is that we only bail early if the
new sseu
matches the ce->sseu, but that doesn't necessarily match
whats in the
context due to OA. (Or maybe I missed the conversion to rpcs
value and
checking.)
-Chris
Yep, because the gen8_make_rpcs() post processes the values
store at the gem context level, we risk rerunning the kernel
context to write the exiting value.
Sorry this is all so messy :(
Lets see if I managed to follow here.
The current code indeed bails out at the set ctx param level if
the requested state matches the ce->state. My thinking was that
ce->state is the master state and whatever happens in "post
processing" via gen8_make_rpcs should be hidden from it since the
design is that the i915_perf.c will re-configure all contexts when
the OA active status changes (to either direction).
So I don't see a problem in those two interactions.
Let's say you have contextA with sseu(slice,subslice)=(0x1/0xff)
for ICL.
You then enable OA which locks the configuration at (0x1,0xf).
The kernel context has retained its (0x1/0xff) configuration.
And after you change the config of contextA to (0x1,0x7).
This would lead to the kernel context scheduled with (0x1,0xff)
while OA is active.
Apart from one, get_param_sseu will lie a bit - we can discuss
about this one more. At one point I suggested we have two sets of
masks in the uAPI, requested and active in a way. So userspace
could query what it set and what is actually active.
Now second issue is if i915_perf.c is able to reprogram the kernel
config.
Here its true, it will write to the context image and that will
get overwritten by context save.
If that is a problem for OA, I was initially if a throw-away
second "kernel" context could be use to re-program the real one,
but perhaps even simpler - what about a mmio write to program the
RPCS while kernel context is active?
Documentation says : "This register must not be programmed
directly through CPU MMIO cycle. "
Sorry :(
-
Lionel
Regards,
Tvrtko
|