Re: [Intel-gfx] [PATCH] drm/i915/gen9: Increase PCODE request timeout to 100ms

Imre Deak <imre.deak@xxxxxxxxx> · Tue, 21 Feb 2017 16:18:47 +0200

On Tue, Feb 21, 2017 at 01:19:37PM +0000, Chris Wilson wrote:
> On Tue, Feb 21, 2017 at 02:43:30PM +0200, Imre Deak wrote:
> > On Tue, Feb 21, 2017 at 10:06:45AM +0000, Tvrtko Ursulin wrote:
> > > 
> > > On 21/02/2017 09:37, Chris Wilson wrote:
> > > >On Tue, Feb 21, 2017 at 11:22:12AM +0200, Imre Deak wrote:
> > > >>On Mon, Feb 20, 2017 at 04:05:33PM +0000, Chris Wilson wrote:
> > > >>>So that our preempt-off period doesn't grow completely unchecked, or do
> > > >>>we need that 34ms loop?
> > > >>
> > > >>Yes, that's at least how I understand it. Scheduling away is what let's
> > > >>PCODE start servicing some other request than ours or go idle. That's
> > > >>in a way what we see when the preempt-enabled poll times out.
> > > >
> > > >I was thinking along the lines of if it was just busy/unavailable for the
> > > >first 33ms that particular time, it just needed to sleep until ready.
> > > >Once available, the next request ran in the expected 1ms.
> > >
> > > >Do you not see any value in trying a sleeping loop? Perhaps compromise
> > > >and have the preempt-disable timeout increase each iteration.
> > 
> > This fallback method would work too, but imo the worst case is what
> > matters and that would be anyway the same in both cases. Because of this
> > and since it's a WA I'd rather keep it simple.
> > 
> > > Parachuting in so apologies if I misunderstood something.
> > > 
> > > Is the issue here that we can get starved out of CPU time for more than 33ms
> > > while waiting for an event?
> > 
> > We need to actively resend the same request for this duration.
> > 
> > > Could we play games with sched_setscheduler and maybe temporarily go
> > > SCHED_DEADLINE or something? Would have to look into how to correctly
> > > restore to the old state from that and from which contexts we can actually
> > > end up in this wait.
> > 
> > What would be the benefit wrt. disabling preemption? Note that since
> > it's a workaround it would be good to keep it simple and close to how it
> > worked on previous platforms (SKL/APL).
> 
> Yeah, I'm not happy with busy-spinning for 34ms without any scheduler
> interaction at all. Or that we don't handle the failure gracefully. Or
> that the hw appears pretty flimsy and the communitcation method is hit
> and miss.

Yes, me neither. It's clearly not by design, since based on the
specification two requests 3ms apart would need to be enough.

> I'd accept a compromise that bumped the timer to 50ms i.e. didn't have
> to up the BUILD_BUG_ON. Only a 50% safety factor, but we are already
> an order of magnitude beyond the expected response time.
> 
> 50 I would ack. :|

Ok, I can resend with that if Tvrtko agrees.

--Imre