Re: [Intel-gfx] [PATCH] drm/i915/gen9: Increase PCODE request timeout to 100ms

Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> · Tue, 21 Feb 2017 13:19:37 +0000

On Tue, Feb 21, 2017 at 02:43:30PM +0200, Imre Deak wrote:
> On Tue, Feb 21, 2017 at 10:06:45AM +0000, Tvrtko Ursulin wrote:
> > 
> > On 21/02/2017 09:37, Chris Wilson wrote:
> > >On Tue, Feb 21, 2017 at 11:22:12AM +0200, Imre Deak wrote:
> > >>On Mon, Feb 20, 2017 at 04:05:33PM +0000, Chris Wilson wrote:
> > >>>So that our preempt-off period doesn't grow completely unchecked, or do
> > >>>we need that 34ms loop?
> > >>
> > >>Yes, that's at least how I understand it. Scheduling away is what let's
> > >>PCODE start servicing some other request than ours or go idle. That's
> > >>in a way what we see when the preempt-enabled poll times out.
> > >
> > >I was thinking along the lines of if it was just busy/unavailable for the
> > >first 33ms that particular time, it just needed to sleep until ready.
> > >Once available, the next request ran in the expected 1ms.
> >
> > >Do you not see any value in trying a sleeping loop? Perhaps compromise
> > >and have the preempt-disable timeout increase each iteration.
> 
> This fallback method would work too, but imo the worst case is what
> matters and that would be anyway the same in both cases. Because of this
> and since it's a WA I'd rather keep it simple.
> 
> > Parachuting in so apologies if I misunderstood something.
> > 
> > Is the issue here that we can get starved out of CPU time for more than 33ms
> > while waiting for an event?
> 
> We need to actively resend the same request for this duration.
> 
> > Could we play games with sched_setscheduler and maybe temporarily go
> > SCHED_DEADLINE or something? Would have to look into how to correctly
> > restore to the old state from that and from which contexts we can actually
> > end up in this wait.
> 
> What would be the benefit wrt. disabling preemption? Note that since
> it's a workaround it would be good to keep it simple and close to how it
> worked on previous platforms (SKL/APL).

Yeah, I'm not happy with busy-spinning for 34ms without any scheduler
interaction at all. Or that we don't handle the failure gracefully. Or
that the hw appears pretty flimsy and the communitcation method is hit
and miss.

I'd accept a compromise that bumped the timer to 50ms i.e. didn't have
to up the BUILD_BUG_ON. Only a 50% safety factor, but we are already
an order of magnitude beyond the expected response time.

50 I would ack. :|
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre