On Wed, Feb 21, 2024 at 01:12:18AM +0100, Andi Shyti wrote: > Hi Matt, > > thanks a lot for looking into this. > > On Tue, Feb 20, 2024 at 03:39:18PM -0800, Matt Roper wrote: > > On Tue, Feb 20, 2024 at 03:35:26PM +0100, Andi Shyti wrote: > > [...] > > > > diff --git a/drivers/gpu/drm/i915/gt/intel_engine_user.c b/drivers/gpu/drm/i915/gt/intel_engine_user.c > > > index 833987015b8b..7041acc77810 100644 > > > --- a/drivers/gpu/drm/i915/gt/intel_engine_user.c > > > +++ b/drivers/gpu/drm/i915/gt/intel_engine_user.c > > > @@ -243,6 +243,15 @@ void intel_engines_driver_register(struct drm_i915_private *i915) > > > if (engine->uabi_class == I915_NO_UABI_CLASS) > > > continue; > > > > > > + /* > > > + * Do not list and do not count CCS engines other than the first > > > + */ > > > + if (engine->uabi_class == I915_ENGINE_CLASS_COMPUTE && > > > + engine->uabi_instance > 0) { > > > + i915->engine_uabi_class_count[engine->uabi_class]--; > > > + continue; > > > + } > > > > Wouldn't it be simpler to just add a workaround to the end of > > engine_mask_apply_compute_fuses() if we want to ensure only a single > > compute engine gets exposed? Then both the driver internals and uapi > > will agree that's there's just one CCS (and on which one there is). > > > > If we want to do something fancy with "hotplugging" a new engine later > > on or whatever, that can be handled in the future series (although as > > noted on the previous patch, it sounds like these changes might not > > actually be aligned with the workaround we were trying to address). > > The hotplugging capability is one of the features I was looking > for, actually. > > I have done some more refactoring in this piece of code in > upcoming patches. > > Will check, though, if I can do something with compute_fuses(), > even though, the other cslices are not really fused off (read > below). > > > > + > > > rb_link_node(&engine->uabi_node, prev, p); > > > rb_insert_color(&engine->uabi_node, &i915->uabi_engines); > > > > > > diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c > > > index a425db5ed3a2..e19df4ef47f6 100644 > > > --- a/drivers/gpu/drm/i915/gt/intel_gt.c > > > +++ b/drivers/gpu/drm/i915/gt/intel_gt.c > > > @@ -168,6 +168,14 @@ static void init_unused_rings(struct intel_gt *gt) > > > } > > > } > > > > > > +static void intel_gt_apply_ccs_mode(struct intel_gt *gt) > > > +{ > > > + if (!IS_DG2(gt->i915)) > > > + return; > > > + > > > + intel_uncore_write(gt->uncore, XEHP_CCS_MODE, 0); > > > > This doesn't look right to me. A value of 0 means every cslice gets > > associated with CCS0. > > Yes, that's what I'm trying to do. The behavior I'm looking for > is this one: > > /* > ... > * With 1 engine (ccs0): > * slice 0, 1, 2, 3: ccs0 > * > * With 2 engines (ccs0, ccs1): > * slice 0, 2: ccs0 > * slice 1, 3: ccs1 > * > * With 4 engines (ccs0, ccs1, ccs2, ccs3): > * slice 0: ccs0 > * slice 1: ccs1 > * slice 2: ccs2 > * slice 3: ccs3 > ... > */ > > where the user can configure runtime the mode, making sure that > no client is connected to i915. > > But, this needs to be written > > As we are now forcing mode '1', then all cslices are connected > with ccs0. Right --- and that's what I'm pointing out as illegal. I think that code comment above was taken out of context from a different RFC series; that's not an accurate description of the behavior we want here. First, the above comment is using ccs# to refer to userspace engines, not hardware engines. As a simple example, DG2-G11 only ever has a single CCS which userspace sees as "instance 0" but which is actually CCS1 at the hardware level. If you try to follow the comment above when programming CCS_MODE, you've assigned all of the cslices to a non-existent engine and assigned no cslices to the CCS engine that actually exists. For DG2-G10 (and I think DG2-G12), there are different combinations of fused-off / not-fused-off engines that will always show up in userspace as CCS0-CCSn, even if those don't match the hardware IDs. Second, the above comment is assuming that you have a part with a maximum fusing config (i.e., all cslices present). Using DG2-G11 again as an example, there's also only a single cslice (cslice1), so if you tell CCS1 that it's allowed to use EUs from non-existent cslice0, cslice2, and cslice3, you might not get the behavior you were hoping for. > > > On a DG2-G11 platform, that will flat out break > > compute since CCS0 is never present (G11 only has a single CCS and it's > > always the hardware's CCS1). Even on a G10 or G12 this could also break > > things depending on the fusing of your card if the hardware CCS0 happens > > to be missing. > > > > Also, the register says that we need a field value of 0x7 for each > > cslice that's fused off. By passing 0, we're telling the CCS engine > > that it can use cslices that may not actually exist. > > does it? Or do I need to write the id (0x0-0x3) of the user > engine? That's how the mode is calculated in this algorithm. 0x0 - 0x3 are how you specify that a specific CCS engine can use the cslice. If the cslice can't be used at all (i.e., it's fused off), then you need to program a 0x7 to ensure no engines try to use the non-existent DSS/EUs. Matt > > > > +} > > > + > > [...] > > > > diff --git a/drivers/gpu/drm/i915/gt/intel_gt_regs.h b/drivers/gpu/drm/i915/gt/intel_gt_regs.h > > > index cf709f6c05ae..c148113770ea 100644 > > > --- a/drivers/gpu/drm/i915/gt/intel_gt_regs.h > > > +++ b/drivers/gpu/drm/i915/gt/intel_gt_regs.h > > > @@ -1605,6 +1605,8 @@ > > > #define GEN12_VOLTAGE_MASK REG_GENMASK(10, 0) > > > #define GEN12_CAGF_MASK REG_GENMASK(19, 11) > > > > > > +#define XEHP_CCS_MODE _MMIO(0x14804) > > > > Nitpick: this doesn't seem to be in the proper place and also breaks > > the file's convention of using tabs to move over to column 48 for the > > definition value. > > This was something I actually forgot to fix. Thanks! -- Matt Roper Graphics Software Engineer Linux GPU Platform Enablement Intel Corporation