Re: [PATCH 1/4] drm/i915/gt: Remove platform comments from workarounds

Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxxxxxxxx> · Wed, 4 Jan 2023 09:58:13 +0000

On 23/12/2022 18:28, Lucas De Marchi wrote:
On Fri, Dec 23, 2022 at 09:02:35AM +0000, Tvrtko Ursulin wrote:

On 22/12/2022 15:55, Lucas De Marchi wrote:
On Thu, Dec 22, 2022 at 10:27:00AM +0000, Tvrtko Ursulin wrote:

On 22/12/2022 08:25, Lucas De Marchi wrote:
The comments are redundant to the checks being done to apply the
workarounds and very often get outdated as workarounds need to be
extended to new platforms or steppings.  Remove them altogether with
the following matches (platforms extracted from intel_workarounds.c):

    find drivers/gpu/drm/i915/gt/ -name '*.c' | xargs sed -i -E \
's/(Wa.*):(bdw|chv|bxt|glk|skl|kbl|cfl|cfl|whl|cml|aml|chv|cl|bw|ctg|elk|ilk|snb|dg|pvc|g4x|ilk|gen|glk|kbl|cml|glk|kbl|cml|hsw|icl|ehl|ivb|hsw|ivb|vlv|kbl|pvc|rkl|dg|adl|skl|skl|bxt|blk|cfl|cnl|glk|snb|tgl|vlv|xehpsdv).*/\1/'
    find drivers/gpu/drm/i915/gt/ -name '*.c' | xargs sed -i -E \
's/(Wa.*):(bdw|chv|bxt|glk|skl|kbl|cfl|cfl|whl|cml|aml|chv|cl|bw|ctg|elk|ilk|snb|dg|pvc|g4x|ilk|gen|glk|kbl|cml|glk|kbl|cml|hsw|icl|ehl|ivb|hsw|ivb|vlv|kbl|pvc|rkl|dg|adl|skl|skl|bxt|blk|cfl|cnl|glk|snb|tgl|vlv|xehpsdv).*\*\//\1

Same things was executed in the gem directory, omitted here for 
brevity.

There were a few false positives that included the workaround
description. Those were manually patched.

sed -E 's/(Wa[a-zA-Z0-9_]+)[:,]([a-zA-Z0-9,-_\+\[]{2,})/\1/'

then there are false negatives. We have Was in the form
"Wa_xxx:tgl,dg2, mtl". False positives we can fixup, false negatives
we simply don't see. After running that in gt/:

$ git grep ": mtl" -- drivers/gpu/drm/i915/
drivers/gpu/drm/i915/gt/intel_gt_pm.c:  /* Wa_14017073508: mtl */
drivers/gpu/drm/i915/gt/intel_gt_pm.c:  /* Wa_14017073508: mtl */
drivers/gpu/drm/i915/gt/intel_gt_pm.c:  /* Wa_14017073508: mtl */
drivers/gpu/drm/i915/gt/intel_gt_pm.c:  /* Wa_14017073508: mtl */
drivers/gpu/drm/i915/gt/uc/intel_guc_rc.c:       * Wa_14017073508: mtl
drivers/gpu/drm/i915/i915_reg.h:/* Wa_14017210380: mtl */

I was going with the platform names to avoid the false
negatives and because I was entertaining the idea of only doing this for
latest platforms where we do have the "Wa_[[:number:]]" form

Maybe..

Matt recently said he has this worked planned, but more importantly 
- I gather then that the WA lookup tool definitely does not output 
these strings?

Whatever it does it's true only at the time it's called. It simply 
tells what
are the platforms and steppings the Wa applies to. We can change the
output to whatever we want, but that is not the point.
Those comments get stale and bring no real value as they match 1:1
what the code is supposed to be doing. Several times a patch has to
update just that comment to "extend a workaround" to a next platform.
This is not always done, so we get a comment that doesn't match what is
supposed to be there.

Tl;dr; version - lets park this until January and discuss once 
everyone is back.

I'll leave my comment here since I will be out until mid January.

Longer version. I've been trying to get us talking about this a couple 
times before and I'd really like to close with an explicit consensus, 
discussion points addressed instead of skipped and just moving ahead 
with patches.

References:
 3fcf71b9-337f-6186-7b00-27cbfd116743@xxxxxxxxxxxxxxx
 Y5j0b/bykbitCa4Q@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

So point I wanted to discuss is whether these comments are truly 
useless or maybe they can help during review. If the tool can actually 
output them then I am leaning towards that they can be.

I consider "can the tool output xyz?" asking the wrong question.
"The tool", which is our own small python script querying a database can
output anything like that if we want to. The database has information of
what are the platforms/steppings for each the WA is known to be applied
*today*. And that can change and do change often, particularly for early
steppings and recent platforms.

Thought is, when a patch comes for review adding a new platform, 
stepping, whatever, to an existing if condition, if it contains the 
comments reviewer can more easily spot a hyphotetical logic inversion 
error or similar. It is also trivial to check that both condition and 
comment have been updated. (So lets not be rash and remove something 
maybe useful just because it can go stale *only if* reviewers are not 
giving sufficient attention that changes are made in tandem.)

I can argue to the other side too. We don't have comments in the kernel
like

     /* Add 1 to i */
     i += 1;

This is exactly what these comments are doing. And they are misleading

I'll file this under "Reductio ad absurdum", kind of. :)

and may introduce bugs rather than helping reviewing:

     Wa_12345:tgl[a0,c0)
     if (IS_TGL_GRAPHICS_STEP(STEP_A0, STEP_B0)

One might read the comment, skipping over the condition and thinking
"ok, we already extended this WA to B* steppings, which doesn't match
the code.

That would be reviewer error to assume B0 is the last B stepping, 
without actually checking. Equally as reviewer error would be to assume 
any WA adding patch is adding the correct conditions, again, without 
actually checking. Which leads me to ...

From a slightly different angle - do we expect anyone reviewing 
workaround patches will cross-check against the tool? Would it be 
simpler and more efficient that they could just cross-check against 
the comment output from the tool and put into the patch by the author?

see above. Someone cross-checking the comment is cross-checking the
wrong thing. As I said, it happens more on early enabling of a platform.

... my point which seems to have been missed by both, well question 
really, do you expect every reviewer to cross-check against the WA 
database when reviewing WA changes? I don't see that was answered.

I guarantee that it won't happen and people will rubber stamp. So my 
argument was that we could make it both easier for reviewers *and* 
decrease the likelyhood of misses, if we kept platforms designators in 
comments.

Yeah it is much easier to rip them out that to find and fix the ones 
which went out of sync but that shouldn't be high on the list of criteria.

Argument that it is easy to overlook during review that comments and 
code do not match I don't think holds. That describes a very sloppy 
review. And if review is assumed to be that sloppy, do you really trust 
review to check against the WA database?

So my argument is that it is trivial for reviewers to spot comments and 
code do not match. Trivial and fast. And it's trivial (I hope) for the 
WA tool to output the right format for pasting in comments.

Those are the points I would like to have explicitly discounted before 
proceeding. Maybe to be even clearer the workflow would be like this:

Patch author:

1. Runs the WA tool for a WA number. Tool outputs text.
2. Pastes text verbatim in the comment.
3. Adjusts code to match.

Reviewer:

1. Verifies both code and comment were changed.
2. Verifies code matches the comment.

If the counter proposal is, patch author:

1. Runs the WA tool for a WA number. Tool outputs text.
2. Adjusts code to match.

Reviewer:

1. Runs the WA tool. Tool outputs text.
2. Checks patch matchs the WA tool output.

I will accept it but I strongly believe skipping of step 2 will happen 
and it will be impossible to know. Rubber stamping with the options of 
comments+code at least leaves a trace of comment and code being out of sync.

And point here to stress out is that accidental logic errors (missed 
workarounds) can be super expensive to debug in the field. Sometimes 
it can literally take _months_ for sporadic and hard to reproduce 
issues to get debugged, handed over between the teams, etc. So any way 
in which we can influence the likelyhood of that happening is 
something to weigh carefully.

yes, that's why I want to remove the comments: from my experience they
are more a source of bugs rather than helping.

Secondary but also important - if i915 is end of line then an extra 
why we want to rip out this for ancient platforms. Is the cost/benefit 
positive there?

yep, here I agree and was my argument about using the platform names
rather than a more "catch all" regex. I think doing this only for tgl+
platforms or even dg2+ would be ok.

Okay this is something to have as a 2nd option indeed. DG2 is out of 
force probe so maybe try with MTL. Although different rules for 
different platforms I don't know if will work in practice. Could be 
justt too complicated to be practical.

As a side note, and going back to the question of what the tool can 
output. Long time ago I had an idea where we could improve all this by 
making it completely data-driven. Have the WA database inspecting tool 
output a table which could be directly pasted into code and 
interpreted by i915.

For reference look at intel_workarounds_table.h in 
https://patchwork.freedesktop.org/patch/399377/?series=83580&rev=3 and 
see what you thing. That was just a sketch of the idea, not complete, 
and yes, i915 end of life point makes it moot.

now that xe is announced I can talk about this part... this was more
or less what I implemented in xe: it's a table with
"register + condition + action". There are the most common condition
checks builtin + a function hook for the more advanced ones. During
binding the driver walks the table and coalesces the entries creating
a per-register value that can be used at the proper times, depending if
they are gt, engine, context workarounds.

Cool, I support that high level approach.

Regards,

Tvrtko