Re: [PATCH 08/10] drm/i915: Cancel non-persistent contexts on close

Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> · Fri, 11 Oct 2019 16:41:16 +0100



Quoting Chris Wilson (2019-10-11 15:22:17)
> Quoting Tvrtko Ursulin (2019-10-11 14:55:00)
> > 
> > On 10/10/2019 08:14, Chris Wilson wrote:
> > > +             if (engine)
> > > +                     active |= engine->mask;
> > > +
> > > +             dma_fence_put(fence);
> > > +     }
> > > +
> > > +     /*
> > > +      * Send a "high priority pulse" down the engine to cause the
> > > +      * current request to be momentarily preempted. (If it fails to
> > > +      * be preempted, it will be reset). As we have marked our context
> > > +      * as banned, any incomplete request, including any running, will
> > > +      * be skipped following the preemption.
> > > +      */
> > > +     reset = 0;
> > > +     for_each_engine_masked(engine, gt->i915, active, tmp)
> > > +             if (intel_engine_pulse(engine))
> > > +                     reset |= engine->mask;
> > 
> > What if we were able to send a pulse, but the hog cannot be preempted 
> > and hangcheck is obviously disabled - who will do the reset?
> 
> Hmm, the idea is that forced-preemption causes the reset.
> (See igt/gem_ctx_persistence/hostile)
> 
> However, if we give the sysadmin the means to disable force-preemption,
> we just gave them another shovel to dig a hole with.
> 
> A last resort would be another timer here to ensure the context was
> terminated.

That does not cut it, as we only looking at it from the pov of the
context being guilty and not the victim. So the answer remains forced
preemption, and a backdoor if that is disabled.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx