Re: [Intel-gfx] [PATCH] drm/i915/gen9: Increase PCODE request timeout to 100ms

Imre Deak <imre.deak@xxxxxxxxx> · Tue, 21 Feb 2017 16:13:47 +0200

On Tue, Feb 21, 2017 at 01:11:27PM +0000, Tvrtko Ursulin wrote:
> 
> On 21/02/2017 12:43, Imre Deak wrote:
> >On Tue, Feb 21, 2017 at 10:06:45AM +0000, Tvrtko Ursulin wrote:
> >>
> >>On 21/02/2017 09:37, Chris Wilson wrote:
> >>>On Tue, Feb 21, 2017 at 11:22:12AM +0200, Imre Deak wrote:
> >>>>On Mon, Feb 20, 2017 at 04:05:33PM +0000, Chris Wilson wrote:
> >>>>>So that our preempt-off period doesn't grow completely unchecked, or do
> >>>>>we need that 34ms loop?
> >>>>
> >>>>Yes, that's at least how I understand it. Scheduling away is what let's
> >>>>PCODE start servicing some other request than ours or go idle. That's
> >>>>in a way what we see when the preempt-enabled poll times out.
> >>>
> >>>I was thinking along the lines of if it was just busy/unavailable for the
> >>>first 33ms that particular time, it just needed to sleep until ready.
> >>>Once available, the next request ran in the expected 1ms.
> >>
> >>>Do you not see any value in trying a sleeping loop? Perhaps compromise
> >>>and have the preempt-disable timeout increase each iteration.
> >
> >This fallback method would work too, but imo the worst case is what
> >matters and that would be anyway the same in both cases. Because of this
> >and since it's a WA I'd rather keep it simple.
> >
> >>Parachuting in so apologies if I misunderstood something.
> >>
> >>Is the issue here that we can get starved out of CPU time for more than 33ms
> >>while waiting for an event?
> >
> >We need to actively resend the same request for this duration.
> >
> >>Could we play games with sched_setscheduler and maybe temporarily go
> >>SCHED_DEADLINE or something? Would have to look into how to correctly
> >>restore to the old state from that and from which contexts we can actually
> >>end up in this wait.
> >
> >What would be the benefit wrt. disabling preemption? Note that since
> >it's a workaround it would be good to keep it simple and close to how it
> >worked on previous platforms (SKL/APL).
> 
> It would be nicer not to relax that BUILD_BUG_ON in atomic wait for and, if
> the main problem is the scheduler/CPU starvation, to see if it can be solved
> differently. Even though the atomic wait here would trigger very rarely it
> might be worth coming up with something nicer and generalized.
> 
> If I understood it correctly, the difference between this wait_for call site
> and the rest is that here it wants a certain number of COND checks to be
> guaranteed?

Yes.

> The other call sites care more about checking on enter and exit.
> 
> So in this case we want the period parameter to actually be guaranteed (or
> close). This sounded like a good candidate for SCHED_DEADLINE to me. Like
> wait_for_periodic(COND, TIMEOUT, INTERVAL).

Could be. But this would give less guarantee than disabling preemption,
as SCHED_DEADLINE still works on a best effort basis. How about
increasing the timeout now (to 50ms) and trying what you suggest as a
follow-up? That way we have also something for -stable.

--Imre

> Maybe that could get away with the second atomic loop and be a generic
> solution on all platforms.

> 
> Regards,
> 
> Tvrtko