Re: [Intel-gfx] [PATCH] drm/i915/gen9: Increase PCODE request timeout to 100ms

Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxxxxxxxx> · Tue, 21 Feb 2017 14:16:52 +0000

On 21/02/2017 14:13, Imre Deak wrote:
On Tue, Feb 21, 2017 at 01:11:27PM +0000, Tvrtko Ursulin wrote:

On 21/02/2017 12:43, Imre Deak wrote:
On Tue, Feb 21, 2017 at 10:06:45AM +0000, Tvrtko Ursulin wrote:

On 21/02/2017 09:37, Chris Wilson wrote:
On Tue, Feb 21, 2017 at 11:22:12AM +0200, Imre Deak wrote:
On Mon, Feb 20, 2017 at 04:05:33PM +0000, Chris Wilson wrote:
So that our preempt-off period doesn't grow completely unchecked, or do
we need that 34ms loop?

Yes, that's at least how I understand it. Scheduling away is what let's
PCODE start servicing some other request than ours or go idle. That's
in a way what we see when the preempt-enabled poll times out.

I was thinking along the lines of if it was just busy/unavailable for the
first 33ms that particular time, it just needed to sleep until ready.
Once available, the next request ran in the expected 1ms.

Do you not see any value in trying a sleeping loop? Perhaps compromise
and have the preempt-disable timeout increase each iteration.

This fallback method would work too, but imo the worst case is what
matters and that would be anyway the same in both cases. Because of this
and since it's a WA I'd rather keep it simple.

Parachuting in so apologies if I misunderstood something.

Is the issue here that we can get starved out of CPU time for more than 33ms
while waiting for an event?

We need to actively resend the same request for this duration.

Could we play games with sched_setscheduler and maybe temporarily go
SCHED_DEADLINE or something? Would have to look into how to correctly
restore to the old state from that and from which contexts we can actually
end up in this wait.

What would be the benefit wrt. disabling preemption? Note that since
it's a workaround it would be good to keep it simple and close to how it
worked on previous platforms (SKL/APL).

It would be nicer not to relax that BUILD_BUG_ON in atomic wait for and, if
the main problem is the scheduler/CPU starvation, to see if it can be solved
differently. Even though the atomic wait here would trigger very rarely it
might be worth coming up with something nicer and generalized.

If I understood it correctly, the difference between this wait_for call site
and the rest is that here it wants a certain number of COND checks to be
guaranteed?

Yes.

The other call sites care more about checking on enter and exit.

So in this case we want the period parameter to actually be guaranteed (or
close). This sounded like a good candidate for SCHED_DEADLINE to me. Like
wait_for_periodic(COND, TIMEOUT, INTERVAL).

Could be. But this would give less guarantee than disabling preemption,
as SCHED_DEADLINE still works on a best effort basis. How about
increasing the timeout now (to 50ms) and trying what you suggest as a
follow-up? That way we have also something for -stable.

If 50ms works it is fine by me.

Regards,

Tvrtko