Re: [PATCH 18/46] drm/i915: Replace global_seqno with a hangcheck heartbeat seqno

Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxxxxxxxx> · Mon, 11 Feb 2019 16:56:03 +0000

On 11/02/2019 12:44, Chris Wilson wrote:
Quoting Tvrtko Ursulin (2019-02-11 12:40:07)

On 06/02/2019 13:03, Chris Wilson wrote:
To determine whether an engine has 'stuck', we simply check whether or
not is still on the same seqno for several seconds. To keep this simple
mechanism intact over the loss of a global seqno, we can simply add a
new global heartbeat seqno instead. As we cannot know the sequence in
which requests will then be completed, we use a primitive random number
generator instead (with a cycle long enough to not matter over an
interval of a few thousand requests between hangcheck samples).

We couldn't keep the global seqno just for hangcheck puposes? I mean as
long as it is unique, which would be guaranteed by obtaining an
increment on every submission to hw and storing it in atomic_t
i915->hangcheck_global_seqno / rq->hangcheck_global_seqno, hangcheck
does not care about the order of execution, no?

s/global_seqno/hangcheck_seqno/ ?

Yes sure, I was just trying to express the idea that a "globally" unique 
number is all that I thought we need. Like:

    rq->hangcheck_seqno = atomic_inc_return(&i915->hangcheck_seqno);

Did I get that right then? That we don't really need the pseudo random 
number solution? We could even avoid calling it a seqno if desired. 
rq->unique, wait.. we possibly had this name for something in the past..

(a) the goal is to kill off global_seqno entirely so we are all sure
there is no such seqno or ordering anymore
(b) this is a temporary patch and we kill off hangcheck_seqno, just as
soon as I can submit requests without struct_mutex

The heartbeat request solution? Is that better than the hangcheck seqno?

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx