On Wed, Jun 10, 2015 at 06:16:20PM +0300, Imre Deak wrote: > On ke, 2015-06-10 at 18:00 +0300, Ville Syrjälä wrote: > > On Wed, Jun 10, 2015 at 05:55:24PM +0300, Imre Deak wrote: > > > On ke, 2015-06-10 at 15:21 +0100, Chris Wilson wrote: > > > > On Wed, Jun 10, 2015 at 05:07:46PM +0300, Imre Deak wrote: > > > > > On ti, 2015-06-09 at 11:21 +0300, Jani Nikula wrote: > > > > > > On Mon, 08 Jun 2015, Imre Deak <imre.deak@xxxxxxxxx> wrote: > > > > > > > By running igt/store_dword_loop_render on BXT we can hit a coherency > > > > > > > problem where the seqno written at GPU command completion time is not > > > > > > > seen by the CPU. This results in __i915_wait_request seeing the stale > > > > > > > seqno and not completing the request (not considering the lost > > > > > > > interrupt/GPU reset mechanism). I also verified that this isn't a case > > > > > > > of a lost interrupt, or that the command didn't complete somehow: when > > > > > > > the coherency issue occured I read the seqno via an uncached GTT mapping > > > > > > > too. While the cached version of the seqno still showed the stale value > > > > > > > the one read via the uncached mapping was the correct one. > > > > > > > > > > > > > > Work around this issue by clflushing the corresponding CPU cacheline > > > > > > > following any store of the seqno and preceding any reading of it. When > > > > > > > reading it do this only when the caller expects a coherent view. > > > > > > > > > > > > > > Testcase: igt/store_dword_loop_render > > > > > > > Signed-off-by: Imre Deak <imre.deak@xxxxxxxxx> > > > > > > > --- > > > > > > > drivers/gpu/drm/i915/intel_lrc.c | 17 +++++++++++++++++ > > > > > > > drivers/gpu/drm/i915/intel_ringbuffer.h | 7 +++++++ > > > > > > > 2 files changed, 24 insertions(+) > > > > > > > > > > > > > > diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c > > > > > > > index 9f5485d..88bc5525 100644 > > > > > > > --- a/drivers/gpu/drm/i915/intel_lrc.c > > > > > > > +++ b/drivers/gpu/drm/i915/intel_lrc.c > > > > > > > @@ -1288,12 +1288,29 @@ static int gen8_emit_flush_render(struct intel_ringbuffer *ringbuf, > > > > > > > > > > > > > > static u32 gen8_get_seqno(struct intel_engine_cs *ring, bool lazy_coherency) > > > > > > > { > > > > > > > + /* > > > > > > > + * On BXT-A1 there is a coherency issue whereby the MI_STORE_DATA_IMM > > > > > > > + * storing the completed request's seqno occasionally doesn't > > > > > > > + * invalidate the CPU cache. Work around this by clflushing the > > > > > > > + * corresponding cacheline whenever the caller wants the coherency to > > > > > > > + * be guaranteed. Note that this cacheline is known to be > > > > > > > + * clean at this point, since we only write it in gen8_set_seqno(), > > > > > > > + * where we also do a clflush after the write. So this clflush in > > > > > > > + * practice becomes an invalidate operation. > > > > > > > > Did you compare and contrast with the gen6+ w/a? A clflush may just work > > > > out quicker considering that the posting read would involve a spinlock > > > > and fw dance. > > > > > > Actually, I did, but only saw that it only works, didn't benchmark it. > > > I'd also think that clflush would be faster, since it's only a cache > > > invalidate at this point. But I will compare the two things now. > > > > If an mmio read fixes it then it doesn't feel like a snoop problem after > > all. > > Ok, I retract what I just said. I tried now and with the patch below and > still see the problem. I must have remembered the testcase where I > created a separate GTT mapping for the status page and read the seqno > for that. Sorry for the confusion. Useful to know. Also something else to try is to set_pages_array_wc (or set_memory_wc) for our internal mmaping of the hws. Though clflush is likely to less of a maintenance issue. -Chris -- Chris Wilson, Intel Open Source Technology Centre _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/intel-gfx