On Mon, 2019-03-11 at 10:39 +0000, Tvrtko Ursulin wrote: > On 08/03/2019 03:16, Carlos Santa wrote: > > On Fri, 2019-03-01 at 09:36 +0000, Chris Wilson wrote: > > > > > > > > > > Quoting Carlos Santa (2019-02-21 02:58:16) > > > > +#define GEN8_WATCHDOG_1000US(dev_priv) > > > > watchdog_to_clock_counts(dev_priv, 1000) > > > > +static void gen8_watchdog_irq_handler(unsigned long data) > > > > +{ > > > > + struct intel_engine_cs *engine = (struct > > > > intel_engine_cs > > > > *)data; > > > > + struct drm_i915_private *dev_priv = engine->i915; > > > > + unsigned int hung = 0; > > > > + u32 current_seqno=0; > > > > + char msg[80]; > > > > + unsigned int tmp; > > > > + int len; > > > > + > > > > + /* Stop the counter to prevent further timeout > > > > interrupts > > > > */ > > > > + I915_WRITE_FW(RING_CNTR(engine->mmio_base), > > > > get_watchdog_disable(engine)); > > > > + > > > > + /* Read the heartbeat seqno once again to check if we > > > > are > > > > stuck? */ > > > > + current_seqno = > > > > intel_engine_get_hangcheck_seqno(engine); > > > > > > I have said this before, but this doesn't exist either, it's just > > > a > > > temporary glitch in the matrix. > > > > > > > Chris, Tvrtko, I need some guidance on how to find the quilty seqno > > during a hang, can you please advice here what to do? > > When an interrupt fires you need to ascertain whether the same > request > which enabled the watchdog is running, correct? > > So I think you would need this, with a disclaimer that I haven't > thought > about the details really: > > 1. Take a reference to timeline hwsp when setting up the watchdog for > a > request. > > 2. Store the initial seqno associated with this request. > > 3. Force enable user interrupts. > > 4. When timeout fires, inspect the HWSP seqno to see if the request > completed or not. > > 5. Reset the engine if not completed. > > 6. Put the timeline/hwsp reference. static int gen8_emit_bb_start(struct i915_request *rq, u64 offset, u32 len, const unsigned int flags) { struct i915_timeline *tl; u32 seqno; if (enable_watchdog) { /* Start watchdog timer */ cs = gen8_emit_start_watchdog(rq, cs); tl = ce->ring->timeline; i915_timeline_get_seqno(tl, rq, &seqno); /*Store initial hwsp seqno associated with this request engine->watchdog_hwsp_seqno = tl->hwsp_seqno; } } static void gen8_watchdog_tasklet(unsigned long data) { struct i915_request *rq; rq = intel_engine_find_active_request(engine); /* Inspect the watchdog seqno once again for completion? */ if (!i915_seqno_passed(engine->watchdog_hwsp_seqno, rq- >fence.seqno)) { //Reset Engine } } Tvrtko, is the above acceptable to inspect whether the seqno has completed? I noticed there's a helper function i915_request_completed(struct i915_request *rq) but it will require me to modify it in order to pass 2 different seqnos. Regards, Carlos > > If the user interrupt fires with the request completed cancel the > above > operations. > > There could be an inherent race between inspecting the seqno and > deciding to reset. Not sure at the moment what to do. Maybe just call > it > bad luck? > > I also think for the software implementation you need to force no > request coalescing for contexts with timeout set. Because you want > to > have 100% defined borders for request in and out - since the timeout > is > defined per request. > > In this case you don't need the user interrupt for the trailing edge > signal but can use context complete. Maybe putting hooks into > context_in/out in intel_lrc.c would work under these circumstances. > > Also if preempted you need to cancel the timer setup and store > elapsed > execution time. > > Or it may make sense to just disable preemption for these contexts. > Otherwise there is no point in trying to mandate the timeout? > > But it is also kind of bad since non-privileged contexts can make > themselves non-preemptable by setting the watchdog timeout. > > Maybe as a compromise we need to automatically apply an elevated > priority level, but not as high to be completely non-preemptable. > Sounds > like a hard question. > > Regards, > > Tvrtko _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx