Re: [RFC PATCH 04/20] drm/i915: Transform context WAs into static tables

Oscar Mateo <oscar.mateo@xxxxxxxxx> · Mon, 6 Nov 2017 10:54:04 -0800

On 11/06/2017 03:59 AM, Joonas Lahtinen wrote:
On Fri, 2017-11-03 at 11:09 -0700, Oscar Mateo wrote:
This is for WAs that need to touch registers that get saved/restored
together with the logical context. The idea is that WAs are "pretty"
static, so a table is more declarative than a programmatic approah.
Note however that some amount is caching is needed for those things
that are dynamic (e.g. things that need some calculation, or have
a criteria different than the more obvious GEN + stepping).

Also, this makes very explicit which WAs live in the context.

Suggested-by: Joonas Lahtinen <joonas.lahtinen@xxxxxxxxxxxxxxx>
Signed-off-by: Oscar Mateo <oscar.mateo@xxxxxxxxx>
Cc: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx>
Cc: Mika Kuoppala <mika.kuoppala@xxxxxxxxxxxxxxx>
<SNIP>

+struct i915_wa_reg;
+
+typedef bool (* wa_pre_hook_func)(struct drm_i915_private *dev_priv,
+				  struct i915_wa_reg *wa);
+typedef void (* wa_post_hook_func)(struct drm_i915_private *dev_priv,
+				   struct i915_wa_reg *wa);
To avoid carrying any variables over, how about just apply() hook?
Also, you don't have to have "_hook" going there, it's tak

Not all WAs are applied in the same way: ctx-style workarounds are 
emitted as LRI commands to the ring. Do you treat those differently?

  struct i915_wa_reg {
+	const char *name;
We may want some Kconfig option for skipping these.

Sure. But we should try to decide first if we want to store this at all, 
like: what do we expect to use this for? is it worth it?

+	enum wa_type {
+		I915_WA_TYPE_CONTEXT = 0,
+		I915_WA_TYPE_GT,
+		I915_WA_TYPE_DISPLAY,
+		I915_WA_TYPE_WHITELIST
+	} type;
+
Any specific reason not to have the gen here too? Then you can have one
big table, instead of tables of tables. Then the numeric code of a WA
(position in that table) would be equally identifying it compared to
the WA name (which is nice to have information, so config time opt-in).

Such a "big table" would be quite big, indeed. And we know we want to 
apply the workarounds from at least four different places, so looping 
through the table each and every time to find the relevant WAs seems 
like a waste. Also, in some places we would have to loop more than once 
( to know the number of WAs to apply before we can reserve space in the 
ring for ctx-style WAs, for example).

I could also go for 4 slightly smaller tables (one per type of WA) but 
then there is another problem to solve: how do you record WAs that apply 
for all revisions of one GEN, but a smaller number of revisions of 
another? (e.g. WaDisableFenceDestinationToSLM applies to all BDW 
steppings but only KBL A0).

+	u8 since;
+	u8 until;
Most seem to have ALL_REVS, so this could be after the coarse-grained
gen-check in the apply function.

So every single WA that applies to specific REVS gets an "apply" 
function? That looks like a lot of functions (I count 25 WAs that only 
apply to some steppings already). Or are you simply saying here that I 
check the GEN before checking the stepping (which is the only order that 
makes sense anyway)?

+
  	i915_reg_t addr;
-	u32 value;
-	/* bitmask representing WA bits */
  	u32 mask;
+	u32 value;
+	bool is_masked_reg;
I'd hide this detail into the apply function.

I see. But if you don't store the mask: what do you output in debugfs?

+
+	wa_pre_hook_func pre_hook;
+	wa_post_hook_func post_hook;
	bool (*apply)(const struct i915_wa *wa,
		      struct drm_i915_private *dev_priv);

+	u32 hook_data;
+	bool applied;
The big point would be to make this into const, so "applied" would
defeat that.

Yeah, I realized. Keeping a separate bitmask of which WAs have been 
applied is not a big deal, but then I became aware that there are many 
more things that would need to be cached. For example, some WAs require 
to compute the actual value you write into their register. What do you 
do with those? (remember that you still want to print the expected value 
in debugfs for these).

<SNIP>

+#define MASK(mask, value)	((mask) << 16 | (value))
+#define MASK_ENABLE(x)		(MASK((x), (x)))
+#define MASK_DISABLE(x)		(MASK((x), 0))

-#define WA_REG(addr, mask, val) do { \
-		const int r = wa_add(dev_priv, (addr), (mask), (val)); \
-		if (r) \
-			return r; \
-	} while (0)
+#define SET_BIT_MASKED(m) 		\
+	.mask = (m),			\
+	.value = MASK_ENABLE(m),	\
+	.is_masked_reg = true

-#define WA_SET_BIT_MASKED(addr, mask) \
-	WA_REG(addr, (mask), _MASKED_BIT_ENABLE(mask))
+#define CLEAR_BIT_MASKED( m) 		\
+	.mask = (m),			\
+	.value = MASK_DISABLE(m),	\
+	.is_masked_reg = true

-#define WA_CLR_BIT_MASKED(addr, mask) \
-	WA_REG(addr, (mask), _MASKED_BIT_DISABLE(mask))
+#define SET_FIELD_MASKED(m, v) 		\
+	.mask = (m),			\
+	.value = MASK(m, v),		\
+	.is_masked_reg = true
Lets try to have the struct i915_wa as small as possible, so this could
be calculated in the apply function.

So, avoiding the macros this would indeed become rather declarative;

{
	WA_NAME("WaDisableAsyncFlipPerfMode")
	.gen = ...,
	.reg = MI_MODE,
	.value = ASYNC_FLIP_PERF_DISABLE,
	.apply = set_bit_masked,
},
Or, we could also have;

static const struct i915_wa WaDisableAsyncFlipPerfMode = {
	.gen = ...,
	.reg = MI_MODE,
	.value = ASYNC_FLIP_PERF_DISABLE,
	.apply = set_bit_masked,
};

And then one array of those.

	WA(WaDisableAsyncFlipPerfMode),

This is the list of problems we need to solve before we can go forward 
with this design:

- What to do with WAs that don't know a priori what .value should be, 
because it gets computed in places like skl_tune_iz_hashing or 
use_gtt_cache? (yes, computing in the apply function is the immediate 
answer, but then... how do you output that in debugfs?).
- What to do with context-style WAs, that are emitted instead of 
applied, as I mentioned above?.
- What to do with whitelist-style functions, where you need to access 
the .reg field of i915_reg_t to know the .value? Also, the .reg depends 
on the engine (although I guess you can always statically codify that in 
the table and apply the whitelist WAs later, once all the engines are up).
- You are not storing .since/.until. Does that mean every WA that 
applies to only some steppings gets a custom apply function?.
- If you don't store the computed mask anywhere, what do you output in 
debugfs? (which is the real improvement we want to achieve?).
- Something to be careful about: some WAs are named the same, but their 
reg/value is different (because the register has changed in one 
particular GEN or whatever). The solution could be a modifier to the 
name (WaSomething_bdw_chv and  WaSomething_skl) but this could be a 
source of errors.

Then you could at compile time decide if you stringify and store the
name. But that'd be more const data than necessary (pointers to
structs, instead of an array of structs).

Regards, Joonas

One more thing: I still urge to reconsider merging what we already have, 
and doing these improvements (once we agree on a design) later on. The 
reason being that the sooner we get a list of all WAs in debugfs, the 
better (which can be used later on to verify any further improvements we 
do).

Thanks for the review,
Oscar

_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx