Re: [PATCH v2 01/14] drm/i915: Keep a global seqno per-engine

Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxxxxxxxx> · Thu, 16 Feb 2017 08:10:07 +0000

On 15/02/2017 21:49, Chris Wilson wrote:
On Wed, Feb 15, 2017 at 05:05:40PM +0000, Tvrtko Ursulin wrote:

On 14/02/2017 09:54, Chris Wilson wrote:
Replace the global device seqno with one for each engine, and account
for in-flight seqno on each separately. This is consistent with
dma-fence as each timeline has separate fence-contexts for each engine
and a seqno is only ordered within a fence-context (i.e.  seqno do not
need to be ordered wrt to other engines, just ordered within a single
engine). This is required to enable request rewinding for preemption on
individual engines (we have to rewind the global seqno to avoid
overflow, and we do not have to rewind all engines just to preempt one.)

Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx>
---
drivers/gpu/drm/i915/i915_debugfs.c      |  5 +--
drivers/gpu/drm/i915/i915_gem_request.c  | 68 +++++++++++++++-----------------
drivers/gpu/drm/i915/i915_gem_request.h  |  8 +---
drivers/gpu/drm/i915/i915_gem_timeline.h |  4 +-
drivers/gpu/drm/i915/intel_breadcrumbs.c | 33 +++++++---------
drivers/gpu/drm/i915/intel_engine_cs.c   |  2 -
drivers/gpu/drm/i915/intel_ringbuffer.h  |  4 +-
7 files changed, 52 insertions(+), 72 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index cda957c674ee..9b636962cab6 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -1080,10 +1080,7 @@ static const struct file_operations i915_error_state_fops = {
static int
i915_next_seqno_get(void *data, u64 *val)
{
-	struct drm_i915_private *dev_priv = data;
-
-	*val = 1 + atomic_read(&dev_priv->gt.global_timeline.seqno);
-	return 0;
+	return -ENODEV;

I assume reason for leaving this function in this state appears in a
later patch? gt.global_timeline stays around for something else?

There's no longer a single global seqno, so we tell userspace (igt) it can't
have it.

I missed this is debugfs and that we even have this facility. Does the 
exact errno matters here? Thinking of just dropping the vfunc entirely 
and letting the core return an error. After looking it up it seems it 
would be -EACCES.

@@ -325,15 +328,19 @@ static int i915_gem_init_global_seqno(struct drm_i915_private *i915, u32 seqno)
	GEM_BUG_ON(i915->gt.active_requests > 1);

	/* If the seqno wraps around, we need to clear the breadcrumb rbtree */
-	if (!i915_seqno_passed(seqno, atomic_read(&timeline->seqno))) {
-		while (intel_breadcrumbs_busy(i915))
-			cond_resched(); /* spin until threads are complete */
-	}
-	atomic_set(&timeline->seqno, seqno);
+	for_each_engine(engine, i915, id) {
+		struct intel_timeline *tl = &timeline->engine[id];

-	/* Finally reset hw state */
-	for_each_engine(engine, i915, id)
+		if (!i915_seqno_passed(seqno, tl->seqno)) {
+			/* spin until threads are complete */
+			while (intel_breadcrumbs_busy(engine))
+				cond_resched();
+		}
+
+		/* Finally reset hw state */
+		tl->seqno = seqno;
		intel_engine_init_global_seqno(engine, seqno);
+	}

Came back here a bit later. Shouldn't you just handle one engine in
this function if seqnos are per-engine now?

No. We still have multiple engines listening to the seqno of others
(legacy semaphores). So if we wraparound on RCS we have to idle xCS to
be sure they complete any semaphores (semaphores check for >= value, so
if we set future requests to be smaller, they have to wait for a long
time before a new RCS request overtakes the semaphore value).

Ah right, forgot about semaphores.

	/* We may be recursing from the signal callback of another i915 fence */
	spin_lock_nested(&request->lock, SINGLE_DEPTH_NESTING);
	request->global_seqno = seqno;

This field could also be renamed to engine_seqno to be more
self-documenting.

That's going to be wide-sweeping change, let's see what it looks like,
e.g. i915_gem_request_get_engine_seqno()

On the other hand, I thought I called the timeline "[global]"

Okay if it is too much churn never mind then. I guess we can think of 
req->global_seqno as global to the engine ourselves. :

>>> It would be better for the active seqno count to be managed on the
>>> same level for readability. By that I mean having the decrement in
>>> add_request where it was incremented.
>>
>> It's incremented in this function, so the unwind on error is here as
>> well.
>
> Ah, I guess you were referring to the decrement in request_alloc. Pulled
> that out to unreserve_seqno() to match the call to reserve_seqno().

Yes, I said the wrong thing. Got confused by jumping back and forth.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx