Re: [RFC] drm/i915: Temporarily go realtime when polling PCODE

Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxxxxxxxx> · Thu, 23 Feb 2017 13:00:32 +0000

On 23/02/2017 12:01, Imre Deak wrote:
On Thu, Feb 23, 2017 at 09:37:29AM +0000, Tvrtko Ursulin wrote:
[...]
Having read the spec I think I see both sides now.

Spec is actually suggesting we should busy-retry the pcode request for 3ms
in this case.

Well, retry for 3ms without setting any minimum for the number of
requests. That couldn't be guaranteed anyway due to scheduling etc, and
would be a strange ABI. Later Art Runyan clarified this in the way it's
described in the code comment: What is required is two requests at
least 3ms apart. The first request is queued by the firmware and the
second request signals completion.

Why is our loop then spamming the hardware every 10us with requests? 
Perhaps it could be counter-productive? A single sleeping loop with a 
long timeout and a 3ms period wouldn't work? Like:

	ret = _wait_for(COND, 50 * 1000, timeout_base_ms * 1000)

?

It doesn't say how many retries we are supposed to do and how it internally
operates, which makes me unsure if our first more relaxed polling is perhaps
causing or contributing to the issue.

One thing where we don't follow the spec is the timeout for the
GEN6_PCODE_READY poll which spec says should be 150us and not 500ms. I don't
know if this timeout was trigger in the bug reports?

No this PCODE_READY poll always succeeds, it's the reply/reply_mask
response which doesn't get set in time.

Yes I know, I was just thinking if it takes more than 2us it then falls 
back to scheduling & usleep_range. That was at the time I was thinking 
it is really important to poll rapidly. Since you explained above it is 
just the opposite I agree this part is not a problem. It still may make 
sense to wait for that bit for a shorter period as per bspec.

[snip]

But regardless, the fact that the fallback busy loop needs up to 34ms as
well makes the last bit from the above a bit uncertain. Only if the
non-compliant polling we do somehow confuses the hardware and then we end up
having to busy poll longer than we normally would. Probably unlikely.

I'm trying to get more info based on all this (in particular the KBL
problem) from Art. Until that I'd suggest increasing the WA timeout to
50ms, since that solved the problem for the bug reporter. We could fix
things/add more scaffolding if more evidence comes up, or there is a new
bug report.

Yes sure I think I replied before that it is fine by me to push a 50ms 
fix for stable.

Regards,

Tvrtko

_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx