Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> writes: > Quoting Mika Kuoppala (2019-09-24 11:21:38) >> Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> writes: >> >> > Before we submit the first context to HW, we need to construct a valid >> > image of the register state. This layout is defined by the HW and should >> > match the layout generated by HW when it saves the context image. >> > Asserting that this should be equivalent should help avoid any undefined >> > behaviour and verify that we haven't missed anything important! >> > >> > Of course, having insisted that the initial register state within the >> > LRC should match that returned by HW, we need to ensure that it does. >> > >> > Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> >> > Cc: Mika Kuoppala <mika.kuoppala@xxxxxxxxxxxxxxx> >> > Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@xxxxxxxxx> > >> > +static u32 *set_offsets(u32 *regs, >> > + const u8 *data, >> > + const struct intel_engine_cs *engine) >> > +#define NOP(x) (BIT(7) | (x)) >> > +#define LRI(count, flags) ((flags) << 6 | (count)) >> > +#define POSTED BIT(0) >> > +#define REG(x) (((x) >> 2) | BUILD_BUG_ON_ZERO(x >= 0x200)) >> > +#define REG16(x) \ >> > + (((x) >> 9) | BIT(7) | BUILD_BUG_ON_ZERO(x >= 0x10000)), \ >> > + (((x) >> 2) & 0x7f) >> >> I am still not sure if the actual saving are worth the complexity. >> >> > +#define END() 0 >> > +{ >> > + const u32 base = engine->mmio_base; >> > + >> > + while (*data) { >> > + u8 count, flags; >> > + >> > + if (*data & BIT(7)) { /* skip */ >> > + regs += *data++ & ~BIT(7); >> > + continue; >> > + } >> > + >> > + count = *data & 0x3f; >> > + flags = *data >> 6; >> > + data++; >> > + >> > + *regs = MI_LOAD_REGISTER_IMM(count); >> > + if (flags & POSTED) >> > + *regs |= MI_LRI_FORCE_POSTED; >> > + if (INTEL_GEN(engine->i915) >= 11) >> > + *regs |= MI_LRI_CS_MMIO; >> > + regs++; >> > + >> > + GEM_BUG_ON(!count); >> > + do { >> > + u32 offset = 0; >> > + u8 v; >> > + >> > + do { >> > + v = *data++; >> > + offset <<= 7; >> > + offset |= v & ~BIT(7); >> > + } while (v & BIT(7)); >> >> ...but perhaps this amount of extra can be tolerated. >> >> Did you check how this would play out with just REG being wide enough? > > When I started, I thought we could get away with only one REG16. Looking > at the context image I think we might want a few non engine->mmio_base > regs in there (if I read it right, some of the 0x4000 range are per > engine). That will need a slightly different encoding as well :| > > No, I haven't but since you ask, I shall. Now we know the bloat diff and the complexity addition is tiny so I am fine with using the tighter REG/REG16 split. > >> > + >> > + *regs = base + (offset << 2); >> >> In here reader is yearning for an asserts of not trampling >> on wrong territory. > > If you have an idea for a good assert, go for it :) > > What range should be checked. offset < 0x1000 ? > I am fine at selftest trying to take the burden. (that can be read like that I can't make up good asserts) >> But I would guess that you want this part to be like >> oiled lightning and test the machinery with selftest..as the >> subject seems to promise. > > The importance is certainly placed on having a selftest and the > confidence in keeping our offsets in line with the HW. The goal was to > have a compact description for the register offsets, in terms of > readability I think the emphasis should be on the tables > (gen8_xcs_offsets[]). > >> > @@ -3092,7 +3451,7 @@ void intel_execlists_set_default_submission(struct intel_engine_cs *engine) >> > engine->flags |= I915_ENGINE_HAS_PREEMPTION; >> > } >> > >> > - if (engine->class != COPY_ENGINE_CLASS && INTEL_GEN(engine->i915) >= 12) >> > + if (engine->class != COPY_ENGINE_CLASS && INTEL_GEN(engine->i915) >= 11) >> > engine->flags |= I915_ENGINE_HAS_RELATIVE_MMIO; >> >> Ok, first I thought this was unintentional. But prolly not. >> Do you need it for the verifier to work? > > No, I ended up completely ignoring this flag as the HW does not > differentiate between engines. On gen11+, it sets the LRI flag everywhere > in the context image. > >> Could we still rip it out to be a first in the series. >> Just would want to differiante possible icl hickups apart >> from this patch. With the relative MMIO for gen11 lifted as a separate patch prior to this one, Reviewed-by: Mika Kuoppala <mika.kuoppala@xxxxxxxxxxxxxxx> > > Sure. > >> > +static int live_lrc_layout(void *arg) >> > +{ >> > + struct intel_gt *gt = arg; >> > + struct intel_engine_cs *engine; >> > + enum intel_engine_id id; >> > + u32 *mem; >> > + int err; >> > + >> > + /* >> > + * Check the registers offsets we use to create the initial reg state >> > + * match the layout saved by HW. >> > + */ >> > + >> > + mem = kmalloc(PAGE_SIZE, GFP_KERNEL); >> > + if (!mem) >> > + return -ENOMEM; >> > + >> > + err = 0; >> > + for_each_engine(engine, gt->i915, id) { >> > + u32 *hw, *lrc; >> > + int dw; >> > + >> > + if (!engine->default_state) >> > + continue; >> > + >> > + hw = i915_gem_object_pin_map(engine->default_state, >> > + I915_MAP_WB); >> >> This default state is not pristine as we have trampled >> it with our first submission, right? > > It is the context image saved after the first request. > >> But being succeeded at doing so, the next context >> save should overwrite our trampling and it would >> then represent the hw accurate context save >> state. >> >> Against which we will compare of our reg state >> writer. > > Right, default_state is the HW version of our init_reg_state. > >> > + if (IS_ERR(hw)) { >> > + err = PTR_ERR(hw); >> > + break; >> > + } >> > + hw += LRC_STATE_PN * PAGE_SIZE / sizeof(*hw); >> > + >> > + lrc = memset(mem, 0, PAGE_SIZE); >> > + execlists_init_reg_state(lrc, >> > + engine->kernel_context, >> > + engine, >> > + engine->kernel_context->ring, >> > + true); >> > + >> > + dw = 0; >> > + do { >> > + u32 lri = hw[dw]; >> > + >> > + if (lri == 0) { >> > + dw++; >> > + continue; >> > + } >> > + >> > + if ((lri & GENMASK(31, 23)) != MI_INSTR(0x22, 0)) { >> > + pr_err("%s: Expected LRI command at dword %d, found %08x\n", >> > + engine->name, dw, lri); >> > + err = -EINVAL; >> > + break; >> > + } >> > + >> > + if (lrc[dw] != lri) { >> > + pr_err("%s: LRI command mismatch at dword %d, expected %08x found %08x\n", >> > + engine->name, dw, lri, lrc[dw]); >> > + err = -EINVAL; >> > + break; >> > + } >> > + >> > + lri &= 0x7f; >> > + lri++; >> > + dw++; >> > + >> > + while (lri) { >> > + if (hw[dw] != lrc[dw]) { >> > + pr_err("%s: Different registers found at dword %d, expected %x, found %x\n", >> > + engine->name, dw, hw[dw], lrc[dw]); >> > + err = -EINVAL; >> > + break; >> > + } >> > + >> > + /* >> > + * Skip over the actual register value as we >> > + * expect that to differ. >> > + */ >> > + dw += 2; >> > + lri -= 2; >> >> This makes me wonder if we could use this machinery post hang. Just to >> get a little more triage data out, ie 'your context looks corrupted at >> offset %x'... > > Certainly possible, but what we check here is _mostly_ the privileged > registers that are not really meant to be changed by the user -- and we > are only checking the offsets, so unlikely there to be just one wrong. > > The general principle was that we should provide raw information and > have the smarts in userspace (so that we could always enhance our > processing and reanalyse existing dumps). But at the end of the day, > whatever allows us to prevent bugs or fix bugs is paramount. > > But I'm not yet sold this helps. Maybe if we find an example where it > proves useful... > >> > + } >> > + } while ((lrc[dw] & ~BIT(0)) != MI_BATCH_BUFFER_END); >> >> Ok, you tie up always the generate image. For future work add the hw batch >> endpoint be a part of checker? > > It's not always in the first page, I'm not even sure if a BB_END is > always included in the older gen. (I have a feeling the HW definitely > started including it ~gen10.) > -Chris _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx