Re: [PATCH 14/32] drm/i915: Remove forcewake dance from seqno/irq barrier on legacy gen6+

Dave Gordon <david.s.gordon@xxxxxxxxx> · Tue, 5 Jan 2016 12:45:58 +0000

On 11/12/15 11:33, Chris Wilson wrote:
In order to ensure seqno/irq coherency, we current read a ring register.
We are not sure quite how it works, only that is does. Experiments show
that e.g. doing a clflush(seqno) instead is not sufficient, but we can
remove the forcewake dance from the mmio access.

v2: Baytrail wants a clflush too.

Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx>
Cc: Daniel Vetter <daniel.vetter@xxxxxxxx>
---
  drivers/gpu/drm/i915/intel_ringbuffer.c | 15 +++++++++++++--
  1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 6cecc15ec01b..69dd69e46fa9 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -1490,10 +1490,21 @@ gen6_ring_get_seqno(struct intel_engine_cs *ring, bool lazy_coherency)
  {
  	/* Workaround to force correct ordering between irq and seqno writes on
  	 * ivb (and maybe also on snb) by reading from a CS register (like
-	 * ACTHD) before reading the status page. */
+	 * ACTHD) before reading the status page.
+	 *
+	 * Note that this effectively effectively stalls the read by the time
+	 * it takes to do a memory transaction, which more or less ensures
+	 * that the write from the GPU has sufficient time to invalidate
+	 * the CPU cacheline. Alternatively we could delay the interrupt from
+	 * the CS ring to give the write time to land, but that would incur
+	 * a delay after every batch i.e. much more frequent than a delay
+	 * when waiting for the interrupt (with the same net latency).
+	 */
  	if (!lazy_coherency) {
  		struct drm_i915_private *dev_priv = ring->dev->dev_private;
-		POSTING_READ(RING_ACTHD(ring->mmio_base));
+		POSTING_READ_FW(RING_ACTHD(ring->mmio_base));
+
+		intel_flush_status_page(ring, I915_GEM_HWS_INDEX);
  	}

  	return intel_read_status_page(ring, I915_GEM_HWS_INDEX);

Hmm ... would putting the flush before the POSTING_READ be better?

Depending on how the h/w implements the cacheline invalidation, it might 
allow some overlap between the cache controller's internal activities 
and the MMIO cycle ...

Also, previously we only had the flush on BXT, whereas now you're doing 
it on all gen6+. I think this is probably a good thing, but just 
wondered whether there's any downside to it?

Also ... are we sure that no-one calls this without having a forcewake 
in effect at the time, in particular debugfs? Or is it not going to end 
up going through here once lazy_coherency is abolished?

.Dave.
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
http://lists.freedesktop.org/mailman/listinfo/intel-gfx