On Mon, Sep 08, 2014 at 09:15:50AM +0100, Chris Wilson wrote: > On Mon, Sep 08, 2014 at 10:03:51AM +0200, Daniel Vetter wrote: > > On Sun, Sep 07, 2014 at 09:08:31AM +0100, Chris Wilson wrote: > > > Running igt, I was encountering the invalid TLB bug on my 845g, despite > > > that it was using the CS workaround. Examining the w/a buffer in the > > > error state, showed that the copy from the user batch into the > > > workaround itself was suffering from the invalid TLB bug (the first > > > cacheline was broken with the first two words reversed). Time to try a > > > fresh approach. This extends the workaround to write into each page of > > > our scratch buffer in order to overflow the TLB and evict the invalid > > > entries. This could be refined to only do so after we update the GTT, > > > but for simplicity, we do it before each batch. > > > > > > I suspect this supersedes our current workaround, but for safety keep > > > doing both. > > > > I suspect that we might end up with just an elaborate delay > > implementation, but if it works then it's good. One nitpick below, with > > that addressed this is Reviewed-by: Daniel Vetter <daniel.vetter@xxxxxxxx> > > One way to test that is simply comparing 64x4096 byte writes in the same > page vs 64x4 byte writes in 64 different pages. That should be roughly > the same latency (thought with TLB fetches you never be too sure) and > demonstrate that it is either the TLB or the delay that's the factor. > > In my testing, I did multiple copies into the batch w/a so that I was > reasonably sure that what the source was stable and the copy of the > source didn't match N times. > > > > > > > Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > > > Cc: Daniel Vetter <daniel.vetter@xxxxxxxx> > > > Cc: stable@xxxxxxxxxxxxxxx > > > --- > > > drivers/gpu/drm/i915/i915_reg.h | 12 ++++--- > > > drivers/gpu/drm/i915/intel_ringbuffer.c | 62 +++++++++++++++++++-------------- > > > 2 files changed, 44 insertions(+), 30 deletions(-) > > > > > > diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h > > > index e4d7607da2c4..f29b44c86a2f 100644 > > > --- a/drivers/gpu/drm/i915/i915_reg.h > > > +++ b/drivers/gpu/drm/i915/i915_reg.h > > > @@ -334,16 +334,20 @@ > > > #define GFX_OP_DESTBUFFER_INFO ((0x3<<29)|(0x1d<<24)|(0x8e<<16)|1) > > > #define GFX_OP_DRAWRECT_INFO ((0x3<<29)|(0x1d<<24)|(0x80<<16)|(0x3)) > > > #define GFX_OP_DRAWRECT_INFO_I965 ((0x7900<<16)|0x2) > > > -#define SRC_COPY_BLT_CMD ((2<<29)|(0x43<<22)|4) > > > + > > > +#define COLOR_BLT_CMD (2<<29 | 0x40<<22 | (5-2)) > > > +#define SRC_COPY_BLT_CMD ((2<<29)|(0x43<<22)|4) > > > #define XY_SRC_COPY_BLT_CMD ((2<<29)|(0x53<<22)|6) > > > #define XY_MONO_SRC_COPY_IMM_BLT ((2<<29)|(0x71<<22)|5) > > > -#define XY_SRC_COPY_BLT_WRITE_ALPHA (1<<21) > > > -#define XY_SRC_COPY_BLT_WRITE_RGB (1<<20) > > > +#define BLT_WRITE_A (2<<20) > > > +#define BLT_WRITE_RGB (1<<20) > > > +#define BLT_WRITE_RGBA (BLT_WRITE_RGB | BLT_WRITE_A) > > > #define BLT_DEPTH_8 (0<<24) > > > #define BLT_DEPTH_16_565 (1<<24) > > > #define BLT_DEPTH_16_1555 (2<<24) > > > #define BLT_DEPTH_32 (3<<24) > > > -#define BLT_ROP_GXCOPY (0xcc<<16) > > > +#define BLT_ROP_SRC_COPY (0xcc<<16) > > > +#define BLT_ROP_COLOR_COPY (0xf0<<16) > > > #define XY_SRC_COPY_BLT_SRC_TILED (1<<15) /* 965+ only */ > > > #define XY_SRC_COPY_BLT_DST_TILED (1<<11) /* 965+ only */ > > > #define CMD_OP_DISPLAYBUFFER_INFO ((0x0<<29)|(0x14<<23)|2) > > > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c > > > index 16371a444426..acd933b1b027 100644 > > > --- a/drivers/gpu/drm/i915/intel_ringbuffer.c > > > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c > > > @@ -1368,49 +1368,59 @@ i830_dispatch_execbuffer(struct intel_engine_cs *ring, > > > u64 offset, u32 len, > > > unsigned flags) > > > { > > > + u32 cs_offset = ring->scratch.gtt_offset; > > > int ret; > > > > > > - if (flags & I915_DISPATCH_PINNED) { > > > - ret = intel_ring_begin(ring, 4); > > > - if (ret) > > > - return ret; > > > + ret = intel_ring_begin(ring, 6); > > > + if (ret) > > > + return ret; > > > > > > - intel_ring_emit(ring, MI_BATCH_BUFFER); > > > - intel_ring_emit(ring, offset | (flags & I915_DISPATCH_SECURE ? 0 : MI_BATCH_NON_SECURE)); > > > - intel_ring_emit(ring, offset + len - 8); > > > - intel_ring_emit(ring, MI_NOOP); > > > - intel_ring_advance(ring); > > > - } else { > > > - u32 cs_offset = ring->scratch.gtt_offset; > > > + /* Evict the invalid PTE TLBs */ > > > + intel_ring_emit(ring, COLOR_BLT_CMD | BLT_WRITE_RGBA); > > > + intel_ring_emit(ring, BLT_DEPTH_32 | BLT_ROP_COLOR_COPY | 4096); > > > + intel_ring_emit(ring, 64 << 16 | 4); /* load each page */ > > > > s/64/(I830M_BACHT_LIMIT / 4096)/ for clarity > > Hmm, I don't think so. My thinking here is that this is TLB entries, > so perhaps > > #define I830M_TLB_ENTRIES 64 > #define I830M_BATCH_LIMIT 256*1024 /* uAPI convention */ > > #define I830M_WA_BUFSIZE MAX(GTT_PAGE_SIZE * I830M_TLB_ENTRIES, I830M_BATCH_LIMIT) Yeah that works better. Or a comment explaining that we have 64 tlbs. Either is fine, as long as we have a symbolic link to the scratch wa buffer size to make sure we don't overwrite random memory by accident. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch -- To unsubscribe from this list: send the line "unsubscribe stable" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html