Quoting Ville Syrjälä (2019-01-10 16:03:21) > On Thu, Jan 10, 2019 at 10:38:07AM +0000, Chris Wilson wrote: > > Broadwater and the rest of gen4 do support being able to saving and > > reloading context specific registers between contexts, providing isolation > > of the basic GPU state (as programmable by userspace). This allows > > userspace to assume that the GPU retains their state from one batch to the > > next, minimising the amount of state it needs to reload and manually save > > across batches. > > > > v2: CONSTANT_BUFFER woes > > > > Running through piglit turned up an interesting issue, a GPU hang inside > > the context load. The context image includes the CONSTANT_BUFFER command > > that loads an address into a on-gpu buffer, and the context load was > > executing that immediately. However, since it was reading from the GTT > > there is no guarantee that the GTT retains the same configuration as > > when the context was saved, resulting in stray reads and a GPU hang. > > > > Having tried issuing a CONSTANT_BUFFER (to disable the command) from the > > ring before saving the context to no avail, we resort to patching out > > the instruction inside the context image before loading. > > > > This does impose that gen4 always reissues CONSTANT_BUFFER commands on > > each batch, but due to the use of a shared GTT that was and will remain > > a requirement. > > > > Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > > Cc: Ville Syrjälä <ville.syrjala@xxxxxxxxxxxxxxx> > > Cc: Kenneth Graunke <kenneth@xxxxxxxxxxxxx> > > Reviewed-by: Ville Syrjälä <ville.syrjala@xxxxxxxxxxxxxxx> #v1 > > --- > > drivers/gpu/drm/i915/intel_engine_cs.c | 2 +- > > drivers/gpu/drm/i915/intel_gpu_commands.h | 3 +++ > > drivers/gpu/drm/i915/intel_ringbuffer.c | 17 +++++++++++++++++ > > 3 files changed, 21 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c > > index f89b8f199e3f..88109e0de051 100644 > > --- a/drivers/gpu/drm/i915/intel_engine_cs.c > > +++ b/drivers/gpu/drm/i915/intel_engine_cs.c > > @@ -219,6 +219,7 @@ __intel_engine_context_size(struct drm_i915_private *dev_priv, u8 class) > > return round_up(GEN6_CXT_TOTAL_SIZE(cxt_size) * 64, > > PAGE_SIZE); > > case 5: > > + case 4: > > /* > > * There is a discrepancy here between the size reported > > * by the register and the size of the context layout > > @@ -235,7 +236,6 @@ __intel_engine_context_size(struct drm_i915_private *dev_priv, u8 class) > > cxt_size * 64, > > cxt_size - 1); > > return round_up(cxt_size * 64, PAGE_SIZE); > > - case 4: > > case 3: > > case 2: > > /* For the special day when i810 gets merged. */ > > diff --git a/drivers/gpu/drm/i915/intel_gpu_commands.h b/drivers/gpu/drm/i915/intel_gpu_commands.h > > index 105e2a9e874a..00c0175c37ed 100644 > > --- a/drivers/gpu/drm/i915/intel_gpu_commands.h > > +++ b/drivers/gpu/drm/i915/intel_gpu_commands.h > > @@ -266,6 +266,9 @@ > > #define GFX_OP_3DSTATE_BINDING_TABLE_EDIT_PS \ > > ((0x3<<29)|(0x3<<27)|(0x0<<24)|(0x47<<16)) > > > > +#define GFX_OP_CONSTANT_BUFFER \ > > + (0x3 << 29 | 0x0 << 27 | 0x0 << 24 | 0x2 << 16) > > + > > #define MFX_WAIT ((0x3<<29)|(0x1<<27)|(0x0<<16)) > > > > #define COLOR_BLT ((0x2<<29)|(0x40<<22)) > > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c > > index 889f3de79dd0..21bd71cf2e94 100644 > > --- a/drivers/gpu/drm/i915/intel_ringbuffer.c > > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c > > @@ -1632,6 +1632,8 @@ static inline int mi_set_context(struct i915_request *rq, u32 flags) > > len += 2 + (num_rings ? 4*num_rings + 6 : 0); > > else if (IS_GEN(i915, 5)) > > len += 2; > > + else if (IS_GEN(i915, 4)) > > + len += 4; > > if (flags & MI_FORCE_RESTORE) { > > GEM_BUG_ON(flags & MI_RESTORE_INHIBIT); > > flags &= ~MI_FORCE_RESTORE; > > @@ -1668,6 +1670,21 @@ static inline int mi_set_context(struct i915_request *rq, u32 flags) > > * this should never take effect and so be a no-op! > > */ > > *cs++ = MI_SUSPEND_FLUSH | MI_SUSPEND_FLUSH_EN; > > + } else if (IS_GEN(i915, 4)) { > > + /* > > + * Disable CONSTANT_BUFFER before it is loaded from the context > > + * image. For as it is loaded, it is executed and the stored > > + * address may no longer be valid, leading to a GPU hang. > > + * > > + * This imposes the requirement that userspace reload their > > + * CONSTANT_BUFFER on every batch, fortunately a requirement > > + * they are already accustomed to from before contexts were > > + * enabled. > > + */ > > + *cs++ = MI_STORE_DWORD_IMM_GEN4 | MI_USE_GGTT; > > + *cs++ = 0; > > + *cs++ = i915_ggtt_offset(rq->hw_context->state) + 0x1d4; > > Is that offset correct for ctg/elk? The docs suggest that it is not. > Though it looks like it'd land inside 3DSTATE_SAMPLER_PALETTE_LOAD_1 > so probably not something anyone would notice. I just checked piglit didn't break. Admittedly a bit blaise :) > > + *cs++ = GFX_OP_CONSTANT_BUFFER; /* inactive */ > > I'm thinking there are a few ways in which we could even end up > with an inconsistent curbe setup in the context. > > Eg. something like this: > CS_URB_STATE(num>0) > URB_FENCE(cs>0) > CONSTANT_BUFFER(len>0) > CS_URB_STATE(num=0) > URB_FENCE(cs=0) > ctx save > ctx restore > > The restore would end up trying to issue CONSTANT_BUFFER with > len > 0 even though nothing is allocated for the CS unit in > the URB. > > That didn't seem to be the case in your earlier hang though > so not quite sure why it was hanging. And I'm not sure this > would even cause a hang. The spec just says "undefined". > > Bspec has this to say about CONSTANT_BUFFER: > "Programming Notes > Constant Buffers can only be allocated in linear (not tiled) graphics memory > Constant Buffers can only be mapped to Main Memory (UC)" > > Maybe we were violating one of those? Though I don't think we could > even violate the tiled vs. linear restriction unless the memory access > goes through the gtt fence for some reason. I didn't think gen4+ do > that anymore. So that maybe leaves the possibility that a snooped > bo got mapped to the same address. But again I'm not sure if that > would cause a hang or just potentiall stale data being loaded into > the CURBE. Agreed that tiling shouldn't be an issue with the gen4 architecture (and that's probably just cut'n'paste from older). snooped is also unlikely since we/mesa are definitely not allocating snooped buffers on the fly, so the only snoop buffers should be the static HWSP -- and that's not even in the GGTT for Crestline. So perhaps it is an overlapping fence? > Another mystery is why g4x/ilk wouldn't need this. There's nothing in > the docs to suggest that it should behave any differently. Hmm. Maybe > MI_INVALIDATE_ISP could have something to do with this? Maybe that > forces CONSTANT_BUFFER to be saved with valid==0 always? I guess it > might be semi-interesting to peek at the resulting context image > for. The ctg/ilk context image is ugly and looks much more like line noise (like there's a magically 255 dword 3DSTATE command full of junk.) It could just be with the larger GGTT for g4x/ilk, piglit run gpu wasn't able to trigger the same issue. > After a bit of digging I found a few more potentially related > tidbits: > > ECOSPKD[4] Constant Buffer Save/Restore Disable [DevBW-C1+] > > ILK_Chicken_1[7] ILK-C0: Curbe 8HW drop ECO > 0: Fix is enabled to complete the 8HW CURBE restore FIFO clear > 1: Fix is disable. Behavior is same as ILK B0 > > ILK_Chicken_1[0] [DevILK-B0] > “1” CS will force URB modify_enables set after context restore > “0” CS will not force URB modify_enables set after context restore > > All of those default to 0 AFAICS, so only ILK_Chicken_1[7] seems like it > could have any real effect. So wouldn't explain g4x working. That does look interesting. Thanks, -Chris _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx