Re: [PATCH 0/4] Steer multicast register workaround verification

Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxxxxxxxx> · Fri, 1 May 2020 09:01:42 +0100

Hi,

On 01/05/2020 00:15, Matt Roper wrote:
We're seeing some CI errors indicating that a workaround did not apply
properly on EHL/JSL.  The workaround in question is updating a multicast
register, the failures are only seen on specific CI machines, and the
failures only seem to happen on resets and such rather than on initial
driver load.  It seems likely that the culprit here is failure to steer
the multicast register readback on a SKU that has slice0 / subslice0
fused off.

This series makes a couple changes:
  * Workaround verification will explicitly steer MCR registers by
    calling read_subslice_reg rather than a regular read.
  * New multicast ranges are added for gen11 and gen12.  Sadly this
    information is still missing from the bspec (just like the updated
    forcewake tables).  The hardware guys have given us a spreadsheet
    with both the forcewake and the multicast information while they work
    on getting the spec properly updated, so that's where the new ranges
    come from.

I think there are multiple things here. To begin with, newly discovered 
ranges are of course a savior.

But I am not sure about the approach of using intel_read_subslice_reg in 
wa_verify. It is one suspicion that 0xfdc is lost on reset, but we do 
reprogram it afterwards don't we? And since it is the first register in 
the list it is supposed to be in place before the rest of verification 
runs, no?

A year or two I tried figuring this for Icelake and failed, but AFAIR 
(maybe my experiments can be found somewhere on trybot patchwork), I 
even tried both applying the affected ones via unicast (for each ss, or 
l3 where applicable) and also verifying a single register in all enabled 
ss. AFAIR there were still some issues there. Granted my memory could be 
leaky.. But I think this multiple write/verify could still be useful.

(Now that I think about it, I think that the problem area back when I 
experiementing with it was more suspend/resume.. hm..)

My main concern is that with current code we effectively have, after reset:

intel_gt_apply_workarounds:
   program 0xfdc
   program the rest of wa
verify_wa
   do reads using configured 0xfdc

So MCR should be correct. This series seems to be doing:

intel_gt_apply_workarounds:
   program 0xfdc
	* store ss used for MCR configuration
   program the rest of wa
verify_wa
   Do reads but reconfigure 0xfdc before every register in range,
   but to the same value as in initial configuration.

Is this correct? Is the thinking then simply writing the same value to 
0xfdc multiple times fixes things?

Regards,

Tvrtko

P.S. Update, found the experiments, listing some of them:

https://patchwork.freedesktop.org/series/64183/
https://patchwork.freedesktop.org/series/64013/

It reminded me that there were some unexplained issues with regards of 
where I used ffs or fls for finding the valid common MCR setting between 
L3 and SSEU. I think we use a different one than Windows but ours works 
better for our verification, empirically at least. Usual disclaimer 
about my leaky memory applies here.

In addition to MCR and forcewake, there's supposed to be some more bspec
updates coming soon that deal with steering (i.e., different MCR ranges
should actually be using different registers to steer rather than just
the 0xFDC register we're familiar with); I don't have the full details
on that yet, so those updates will have to wait until we actually have
an updated spec.

References: https://gitlab.freedesktop.org/drm/intel/issues/1222

Matt Roper (4):
   drm/i915: Setup multicast register steering for all gen >= 10
   drm/i915: Steer multicast register readback in wa_verify
   drm/i915: Don't skip verification of MCR engine workarounds
   drm/i915: Add MCR ranges for gen11 and gen12

  drivers/gpu/drm/i915/gt/intel_engine.h        |   3 +
  drivers/gpu/drm/i915/gt/intel_engine_cs.c     |  17 +-
  drivers/gpu/drm/i915/gt/intel_workarounds.c   | 146 ++++++++++++------
  .../gpu/drm/i915/gt/intel_workarounds_types.h |   2 +
  4 files changed, 110 insertions(+), 58 deletions(-)

_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx