Re: [Intel-gfx] [PATCH] drm/i915/gen9: Increase PCODE request timeout to 100ms

Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxxxxxxxx> · Tue, 21 Feb 2017 13:11:27 +0000

On 21/02/2017 12:43, Imre Deak wrote:
On Tue, Feb 21, 2017 at 10:06:45AM +0000, Tvrtko Ursulin wrote:

On 21/02/2017 09:37, Chris Wilson wrote:
On Tue, Feb 21, 2017 at 11:22:12AM +0200, Imre Deak wrote:
On Mon, Feb 20, 2017 at 04:05:33PM +0000, Chris Wilson wrote:
So that our preempt-off period doesn't grow completely unchecked, or do
we need that 34ms loop?

Yes, that's at least how I understand it. Scheduling away is what let's
PCODE start servicing some other request than ours or go idle. That's
in a way what we see when the preempt-enabled poll times out.

I was thinking along the lines of if it was just busy/unavailable for the
first 33ms that particular time, it just needed to sleep until ready.
Once available, the next request ran in the expected 1ms.

Do you not see any value in trying a sleeping loop? Perhaps compromise
and have the preempt-disable timeout increase each iteration.

This fallback method would work too, but imo the worst case is what
matters and that would be anyway the same in both cases. Because of this
and since it's a WA I'd rather keep it simple.

Parachuting in so apologies if I misunderstood something.

Is the issue here that we can get starved out of CPU time for more than 33ms
while waiting for an event?

We need to actively resend the same request for this duration.

Could we play games with sched_setscheduler and maybe temporarily go
SCHED_DEADLINE or something? Would have to look into how to correctly
restore to the old state from that and from which contexts we can actually
end up in this wait.

What would be the benefit wrt. disabling preemption? Note that since
it's a workaround it would be good to keep it simple and close to how it
worked on previous platforms (SKL/APL).

It would be nicer not to relax that BUILD_BUG_ON in atomic wait for and, 
if the main problem is the scheduler/CPU starvation, to see if it can be 
solved differently. Even though the atomic wait here would trigger very 
rarely it might be worth coming up with something nicer and generalized.

If I understood it correctly, the difference between this wait_for call 
site and the rest is that here it wants a certain number of COND checks 
to be guaranteed? The other call sites care more about checking on enter 
and exit.

So in this case we want the period parameter to actually be guaranteed 
(or close). This sounded like a good candidate for SCHED_DEADLINE to me. 
Like wait_for_periodic(COND, TIMEOUT, INTERVAL).

Maybe that could get away with the second atomic loop and be a generic 
solution on all platforms.

Regards,

Tvrtko