Re: [PATCH 2/3] drm/i915/gt: Wait for CSB entries on Tigerlake

"Chang, Bruce" <yu.bruce.chang@xxxxxxxxx> · Fri, 14 Aug 2020 11:07:53 -0700

On 8/14/2020 8:57 AM, Chris Wilson wrote:
On Tigerlake, we are seeing a repeat of commit d8f505311717 ("drm/i915/icl:
Forcibly evict stale csb entries") where, presumably, due to a missing
Global Observation Point synchronisation, the write pointer of the CSB
ringbuffer is updated _prior_ to the contents of the ringbuffer. That is
we see the GPU report more context-switch entries for us to parse, but
those entries have not been written, leading us to process stale events,
and eventually report a hung GPU.

However, this effect appears to be much more severe than we previously
saw on Icelake (though it might be best if we try the same approach
there as well and measure), and Bruce suggested the good idea of resetting
the CSB entry after use so that we can detect when it has been updated by
the GPU. By instrumenting how long that may be, we can set a reliable
upper bound for how long we should wait for:

     513 late, avg of 61 retries (590 ns), max of 1061 retries (10099 ns)

Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2045
References: d8f505311717 ("drm/i915/icl: Forcibly evict stale csb entries")
Suggested-by: Bruce Chang <yu.bruce.chang@xxxxxxxxx>
Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx>
Cc: Bruce Chang <yu.bruce.chang@xxxxxxxxx>
Cc: Mika Kuoppala <mika.kuoppala@xxxxxxxxxxxxxxx>
Cc: stable@xxxxxxxxxxxxxxx # v5.4
---
  drivers/gpu/drm/i915/gt/intel_lrc.c | 21 ++++++++++++++++++---
  1 file changed, 18 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
index db982fc0f0bc..3b8161c6b601 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc.c
+++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
@@ -2498,9 +2498,22 @@ invalidate_csb_entries(const u64 *first, const u64 *last)
   */
  static inline bool gen12_csb_parse(const u64 *csb)
  {
-	u64 entry = READ_ONCE(*csb);
-	bool ctx_away_valid = GEN12_CSB_CTX_VALID(upper_32_bits(entry));
-	bool new_queue =
+	bool ctx_away_valid;
+	bool new_queue;
+	u64 entry;
+
+	/* XXX HSD */
+	entry = READ_ONCE(*csb);
+	if (unlikely(entry == -1)) {
+		preempt_disable();
+		if (wait_for_atomic_us((entry = READ_ONCE(*csb)) != -1, 50))
+			GEM_WARN_ON("50us CSB timeout");

Out tests showed that 10us is not long enough, but 20us worked well. So 
50us should be good enough.

+		preempt_enable();
+	}
+	WRITE_ONCE(*(u64 *)csb, -1);

A wmb() is probably needed here. it should be ok if CSB is in SMEM, but 
in the case CSB is allocated in LMEM, the memory type will be WC, so the 
memory write (WRITE_ONCE) is potentially still in the write combine 
buffer and not in any cache system, i.e., not visible to HW.

+
+	ctx_away_valid = GEN12_CSB_CTX_VALID(upper_32_bits(entry));
+	new_queue =
  		lower_32_bits(entry) & GEN12_CTX_STATUS_SWITCHED_TO_NEW_QUEUE;
  
  	/*
@@ -3995,6 +4008,8 @@ static void reset_csb_pointers(struct intel_engine_cs *engine)
  	WRITE_ONCE(*execlists->csb_write, reset_value);
  	wmb(); /* Make sure this is visible to HW (paranoia?) */
  
+	/* Check that the GPU does indeed update the CSB entries! */
+	memset(execlists->csb_status, -1, (reset_value + 1) * sizeof(u64));
  	invalidate_csb_entries(&execlists->csb_status[0],
  			       &execlists->csb_status[reset_value]);