Re: [PATCH v2 2/2] drm/i915/gt: Enable only one CCS for compute workload

Matt Roper <matthew.d.roper@xxxxxxxxx> · Wed, 21 Feb 2024 12:51:04 -0800

On Wed, Feb 21, 2024 at 01:12:18AM +0100, Andi Shyti wrote:
> Hi Matt,
> 
> thanks a lot for looking into this.
> 
> On Tue, Feb 20, 2024 at 03:39:18PM -0800, Matt Roper wrote:
> > On Tue, Feb 20, 2024 at 03:35:26PM +0100, Andi Shyti wrote:
> 
> [...]
> 
> > > diff --git a/drivers/gpu/drm/i915/gt/intel_engine_user.c b/drivers/gpu/drm/i915/gt/intel_engine_user.c
> > > index 833987015b8b..7041acc77810 100644
> > > --- a/drivers/gpu/drm/i915/gt/intel_engine_user.c
> > > +++ b/drivers/gpu/drm/i915/gt/intel_engine_user.c
> > > @@ -243,6 +243,15 @@ void intel_engines_driver_register(struct drm_i915_private *i915)
> > >  		if (engine->uabi_class == I915_NO_UABI_CLASS)
> > >  			continue;
> > >  
> > > +		/*
> > > +		 * Do not list and do not count CCS engines other than the first
> > > +		 */
> > > +		if (engine->uabi_class == I915_ENGINE_CLASS_COMPUTE &&
> > > +		    engine->uabi_instance > 0) {
> > > +			i915->engine_uabi_class_count[engine->uabi_class]--;
> > > +			continue;
> > > +		}
> > 
> > Wouldn't it be simpler to just add a workaround to the end of
> > engine_mask_apply_compute_fuses() if we want to ensure only a single
> > compute engine gets exposed?  Then both the driver internals and uapi
> > will agree that's there's just one CCS (and on which one there is).
> > 
> > If we want to do something fancy with "hotplugging" a new engine later
> > on or whatever, that can be handled in the future series (although as
> > noted on the previous patch, it sounds like these changes might not
> > actually be aligned with the workaround we were trying to address).
> 
> The hotplugging capability is one of the features I was looking
> for, actually.
> 
> I have done some more refactoring in this piece of code in
> upcoming patches.
> 
> Will check, though, if I can do something with compute_fuses(),
> even though, the other cslices are not really fused off (read
> below).
> 
> > > +
> > >  		rb_link_node(&engine->uabi_node, prev, p);
> > >  		rb_insert_color(&engine->uabi_node, &i915->uabi_engines);
> > >  
> > > diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
> > > index a425db5ed3a2..e19df4ef47f6 100644
> > > --- a/drivers/gpu/drm/i915/gt/intel_gt.c
> > > +++ b/drivers/gpu/drm/i915/gt/intel_gt.c
> > > @@ -168,6 +168,14 @@ static void init_unused_rings(struct intel_gt *gt)
> > >  	}
> > >  }
> > >  
> > > +static void intel_gt_apply_ccs_mode(struct intel_gt *gt)
> > > +{
> > > +	if (!IS_DG2(gt->i915))
> > > +		return;
> > > +
> > > +	intel_uncore_write(gt->uncore, XEHP_CCS_MODE, 0);
> > 
> > This doesn't look right to me.  A value of 0 means every cslice gets
> > associated with CCS0.
> 
> Yes, that's what I'm trying to do. The behavior I'm looking for
> is this one:
> 
> 	 /*
> 	  ...
>           * With 1 engine (ccs0):
>           *   slice 0, 1, 2, 3: ccs0
>           *
>           * With 2 engines (ccs0, ccs1):
>           *   slice 0, 2: ccs0
>           *   slice 1, 3: ccs1
>           *
>           * With 4 engines (ccs0, ccs1, ccs2, ccs3):
>           *   slice 0: ccs0
>           *   slice 1: ccs1
>           *   slice 2: ccs2
>           *   slice 3: ccs3
> 	  ...
> 	  */
> 
> where the user can configure runtime the mode, making sure that
> no client is connected to i915.
> 
> But, this needs to be written 
> 
> As we are now forcing mode '1', then all cslices are connected
> with ccs0.

Right --- and that's what I'm pointing out as illegal.  I think that
code comment above was taken out of context from a different RFC series;
that's not an accurate description of the behavior we want here.

First, the above comment is using ccs# to refer to userspace engines,
not hardware engines.  As a simple example, DG2-G11 only ever has a
single CCS which userspace sees as "instance 0" but which is actually
CCS1 at the hardware level.  If you try to follow the comment above when
programming CCS_MODE, you've assigned all of the cslices to a
non-existent engine and assigned no cslices to the CCS engine that
actually exists.  For DG2-G10 (and I think DG2-G12), there are different
combinations of fused-off / not-fused-off engines that will always show
up in userspace as CCS0-CCSn, even if those don't match the hardware
IDs.

Second, the above comment is assuming that you have a part with a
maximum fusing config (i.e., all cslices present).  Using DG2-G11 again
as an example, there's also only a single cslice (cslice1), so if you
tell CCS1 that it's allowed to use EUs from non-existent cslice0,
cslice2, and cslice3, you might not get the behavior you were hoping
for.

> 
> > On a DG2-G11 platform, that will flat out break
> > compute since CCS0 is never present (G11 only has a single CCS and it's
> > always the hardware's CCS1).  Even on a G10 or G12 this could also break
> > things depending on the fusing of your card if the hardware CCS0 happens
> > to be missing.
> > 
> > Also, the register says that we need a field value of 0x7 for each
> > cslice that's fused off.  By passing 0, we're telling the CCS engine
> > that it can use cslices that may not actually exist.
> 
> does it? Or do I need to write the id (0x0-0x3) of the user
> engine? That's how the mode is calculated in this algorithm.

0x0 - 0x3 are how you specify that a specific CCS engine can use the
cslice.  If the cslice can't be used at all (i.e., it's fused off), then
you need to program a 0x7 to ensure no engines try to use the
non-existent DSS/EUs.

Matt

> 
> > > +}
> > > +
> 
> [...]
> 
> > > diff --git a/drivers/gpu/drm/i915/gt/intel_gt_regs.h b/drivers/gpu/drm/i915/gt/intel_gt_regs.h
> > > index cf709f6c05ae..c148113770ea 100644
> > > --- a/drivers/gpu/drm/i915/gt/intel_gt_regs.h
> > > +++ b/drivers/gpu/drm/i915/gt/intel_gt_regs.h
> > > @@ -1605,6 +1605,8 @@
> > >  #define   GEN12_VOLTAGE_MASK			REG_GENMASK(10, 0)
> > >  #define   GEN12_CAGF_MASK			REG_GENMASK(19, 11)
> > >  
> > > +#define XEHP_CCS_MODE                          _MMIO(0x14804)
> > 
> > Nitpick:  this doesn't seem to be in the proper place and also breaks
> > the file's convention of using tabs to move over to column 48 for the
> > definition value.
> 
> This was something I actually forgot to fix. Thanks!

-- 
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation