Re: [PATCH v2 2/2] drm/i915/gt: Enable only one CCS for compute workload

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Matt,

first of all thanks a lot for the observations you are raising.

On Wed, Feb 21, 2024 at 12:51:04PM -0800, Matt Roper wrote:
> On Wed, Feb 21, 2024 at 01:12:18AM +0100, Andi Shyti wrote:
> > On Tue, Feb 20, 2024 at 03:39:18PM -0800, Matt Roper wrote:
> > > On Tue, Feb 20, 2024 at 03:35:26PM +0100, Andi Shyti wrote:

...

> > > > diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
> > > > index a425db5ed3a2..e19df4ef47f6 100644
> > > > --- a/drivers/gpu/drm/i915/gt/intel_gt.c
> > > > +++ b/drivers/gpu/drm/i915/gt/intel_gt.c
> > > > @@ -168,6 +168,14 @@ static void init_unused_rings(struct intel_gt *gt)
> > > >  	}
> > > >  }
> > > >  
> > > > +static void intel_gt_apply_ccs_mode(struct intel_gt *gt)
> > > > +{
> > > > +	if (!IS_DG2(gt->i915))
> > > > +		return;
> > > > +
> > > > +	intel_uncore_write(gt->uncore, XEHP_CCS_MODE, 0);
> > > 
> > > This doesn't look right to me.  A value of 0 means every cslice gets
> > > associated with CCS0.
> > 
> > Yes, that's what I'm trying to do. The behavior I'm looking for
> > is this one:
> > 
> > 	 /*
> > 	  ...
> >           * With 1 engine (ccs0):
> >           *   slice 0, 1, 2, 3: ccs0
> >           *
> >           * With 2 engines (ccs0, ccs1):
> >           *   slice 0, 2: ccs0
> >           *   slice 1, 3: ccs1
> >           *
> >           * With 4 engines (ccs0, ccs1, ccs2, ccs3):
> >           *   slice 0: ccs0
> >           *   slice 1: ccs1
> >           *   slice 2: ccs2
> >           *   slice 3: ccs3
> > 	  ...
> > 	  */
> > 
> > where the user can configure runtime the mode, making sure that
> > no client is connected to i915.
> > 
> > But, this needs to be written 
> > 
> > As we are now forcing mode '1', then all cslices are connected
> > with ccs0.
> 
> Right --- and that's what I'm pointing out as illegal.  I think that
> code comment above was taken out of context from a different RFC series;
> that's not an accurate description of the behavior we want here.
> 
> First, the above comment is using ccs# to refer to userspace engines,
> not hardware engines.  As a simple example, DG2-G11 only ever has a
> single CCS which userspace sees as "instance 0" but which is actually
> CCS1 at the hardware level.  If you try to follow the comment above when
> programming CCS_MODE, you've assigned all of the cslices to a
> non-existent engine and assigned no cslices to the CCS engine that
> actually exists.  For DG2-G10 (and I think DG2-G12), there are different
> combinations of fused-off / not-fused-off engines that will always show
> up in userspace as CCS0-CCSn, even if those don't match the hardware
> IDs.
> 
> Second, the above comment is assuming that you have a part with a
> maximum fusing config (i.e., all cslices present).  Using DG2-G11 again
> as an example, there's also only a single cslice (cslice1), so if you
> tell CCS1 that it's allowed to use EUs from non-existent cslice0,
> cslice2, and cslice3, you might not get the behavior you were hoping
> for.

if the hardware slices are fused off we wouldn't see them in a
first place, right? And that's anyway a permanent configuration
that wouldn't affect the patch.

BTW, is there any machine I can test this scenario?

> > > On a DG2-G11 platform, that will flat out break
> > > compute since CCS0 is never present (G11 only has a single CCS and it's
> > > always the hardware's CCS1).  Even on a G10 or G12 this could also break
> > > things depending on the fusing of your card if the hardware CCS0 happens
> > > to be missing.
> > > 
> > > Also, the register says that we need a field value of 0x7 for each
> > > cslice that's fused off.  By passing 0, we're telling the CCS engine
> > > that it can use cslices that may not actually exist.
> > 
> > does it? Or do I need to write the id (0x0-0x3) of the user
> > engine? That's how the mode is calculated in this algorithm.
> 
> 0x0 - 0x3 are how you specify that a specific CCS engine can use the
> cslice.  If the cslice can't be used at all (i.e., it's fused off), then
> you need to program a 0x7 to ensure no engines try to use the
> non-existent DSS/EUs.

I planned to limit this to the only DG2 (and ATSM, of course).
Do you think it would it help?

Andi




[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux