Re: [RFC 0/8] Force preemption

Jeff McGee <jeff.mcgee@xxxxxxxxx> · Thu, 22 Mar 2018 09:01:37 -0700

On Thu, Mar 22, 2018 at 03:57:49PM +0000, Tvrtko Ursulin wrote:
> 
> On 22/03/2018 14:34, Jeff McGee wrote:
> >On Thu, Mar 22, 2018 at 09:28:00AM +0000, Chris Wilson wrote:
> >>Quoting Tvrtko Ursulin (2018-03-22 09:22:55)
> >>>
> >>>On 21/03/2018 17:26, jeff.mcgee@xxxxxxxxx wrote:
> >>>>From: Jeff McGee <jeff.mcgee@xxxxxxxxx>
> >>>>
> >>>>Force preemption uses engine reset to enforce a limit on the time
> >>>>that a request targeted for preemption can block. This feature is
> >>>>a requirement in automotive systems where the GPU may be shared by
> >>>>clients of critically high priority and clients of low priority that
> >>>>may not have been curated to be preemption friendly. There may be
> >>>>more general applications of this feature. I'm sharing as an RFC to
> >>>>stimulate that discussion and also to get any technical feedback
> >>>>that I can before submitting to the product kernel that needs this.
> >>>>I have developed the patches for ease of rebase, given that this is
> >>>>for the moment considered a non-upstreamable feature. It would be
> >>>>possible to refactor hangcheck to fully incorporate force preemption
> >>>>as another tier of patience (or impatience) with the running request.
> >>>
> >>>Sorry if it was mentioned elsewhere and I missed it - but does this work
> >>>only with stateless clients - or in other words, what would happen to
> >>>stateful clients which would be force preempted? Or the answer is we
> >>>don't care since they are misbehaving?
> >>
> >>They get notified of being guilty for causing a gpu reset; three strikes
> >>and they are out (banned from using the gpu) using the current rules.
> >>This is a very blunt hammer that requires the rest of the system to be
> >>robust; one might argue time spent making the system robust would be
> >>better served making sure that the timer never expired in the first place
> >>thereby eliminating the need for a forced gpu reset.
> >>-Chris
> >
> >Yes, for simplication the policy applied to force preempted contexts
> >is the same as for hanging contexts. It is known that this feature
> >should not be required in a fully curated system. It's a requirement
> >if end user will be alllowed to install 3rd party apps to run in the
> >non-critical domain.
> 
> My concern is whether it safe to call this force _preemption_, while
> it is not really expected to work as preemption from the point of
> view of preempted context. I may be missing some angle here, but I
> think a better name would include words like maximum request
> duration or something.
> 
> I can see a difference between allowed maximum duration when there
> is something else pending, and when it isn't, but I don't
> immediately see that we should consider this distinction for any
> real benefit?
> 
> So should the feature just be "maximum request duration"? This would
> perhaps make it just a special case of hangcheck, which ignores head
> progress, or whatever we do in there.
> 
> Regards,
> 
> Tvrtko

I think you might be unclear about how this works. We're not starting a
preemption to see if we can cleanly remove a request who has begun to
exceed its normal time slice, i.e. hangcheck. This is about bounding
the time that a normal preemption can take. So first start preemption
in response to higher-priority request arrival, then wait for preemption
to complete within a certain amount of time. If it does not, resort to
reset.

So it's really "force the resolution of a preemption", shortened to
"force preemption".
-Jeff
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx