Hi Matt, On Wed, Mar 06, 2024 at 03:46:09PM -0800, Matt Roper wrote: > On Wed, Mar 06, 2024 at 02:22:45AM +0100, Andi Shyti wrote: > > The hardware should not dynamically balance the load between CCS > > engines. Wa_14019159160 recommends disabling it across all > > platforms. > > > > Fixes: d2eae8e98d59 ("drm/i915/dg2: Drop force_probe requirement") > > Signed-off-by: Andi Shyti <andi.shyti@xxxxxxxxxxxxxxx> > > Cc: Chris Wilson <chris.p.wilson@xxxxxxxxxxxxxxx> > > Cc: Joonas Lahtinen <joonas.lahtinen@xxxxxxxxxxxxxxx> > > Cc: Matt Roper <matthew.d.roper@xxxxxxxxx> > > Cc: <stable@xxxxxxxxxxxxxxx> # v6.2+ > > --- > > drivers/gpu/drm/i915/gt/intel_gt_regs.h | 1 + > > drivers/gpu/drm/i915/gt/intel_workarounds.c | 5 +++++ > > 2 files changed, 6 insertions(+) > > > > diff --git a/drivers/gpu/drm/i915/gt/intel_gt_regs.h b/drivers/gpu/drm/i915/gt/intel_gt_regs.h > > index 50962cfd1353..cf709f6c05ae 100644 > > --- a/drivers/gpu/drm/i915/gt/intel_gt_regs.h > > +++ b/drivers/gpu/drm/i915/gt/intel_gt_regs.h > > @@ -1478,6 +1478,7 @@ > > > > #define GEN12_RCU_MODE _MMIO(0x14800) > > #define GEN12_RCU_MODE_CCS_ENABLE REG_BIT(0) > > +#define XEHP_RCU_MODE_FIXED_SLICE_CCS_MODE REG_BIT(1) > > > > #define CHV_FUSE_GT _MMIO(VLV_GUNIT_BASE + 0x2168) > > #define CHV_FGT_DISABLE_SS0 (1 << 10) > > diff --git a/drivers/gpu/drm/i915/gt/intel_workarounds.c b/drivers/gpu/drm/i915/gt/intel_workarounds.c > > index d67d44611c28..a2e78cf0b5f5 100644 > > --- a/drivers/gpu/drm/i915/gt/intel_workarounds.c > > +++ b/drivers/gpu/drm/i915/gt/intel_workarounds.c > > @@ -2945,6 +2945,11 @@ general_render_compute_wa_init(struct intel_engine_cs *engine, struct i915_wa_li > > > > /* Wa_18028616096 */ > > wa_mcr_write_or(wal, LSC_CHICKEN_BIT_0_UDW, UGM_FRAGMENT_THRESHOLD_TO_3); > > + > > + /* > > + * Wa_14019159160: disable the automatic CCS load balancing > > I'm still a bit concerned that this doesn't really match what this > specific workaround is asking us to do. There seems to be an agreement > on various internal email threads that we need to disable load > balancing, but there's no single specific workaround that officially > documents that decision. > > This specific workaround asks us to do a bunch of different things, and > the third item it asks for is to disable load balancing in very specific > cases (i.e., while the RCS is active at the same time as one or more CCS > engines). Taking this workaround in isolation, it would be valid to > keep load balancing active if you were just using the CCS engines and > leaving the RCS idle, or if balancing was turned on/off by the GuC > scheduler according to engine use at the moment, as the documented > workaround seems to assume will be the case. > > So in general I think we do need to disable load balancing based on > other offline discussion, but blaming that entire change on > Wa_14019159160 seems a bit questionable since it's not really what this > specific workaround is asking us to do and someone may come back and try > to "correct" the implementation of this workaround in the future without > realizing there are other factors too. It would be great if we could > get hardware teams to properly document this expectation somewhere > (either in a separate dedicated workaround, or in the MMIO tuning guide) > so that we'll have a more direct and authoritative source for such a > large behavioral change. On one had I think you are right, on the other hand I think this workaround has not properly developed in what we have been describing later. Perhaps, one solution would be to create a new generic workaround for all platforms with more than one CCS and put everyone at peace. But I don't know the process. Are you able to help here? Or Joonas? Thanks, Matt! Andi