Re: [PATCH] drm/i915: Replace some more busy waits with normal ones

Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxxxxxxxx> · Wed, 23 Mar 2016 16:24:48 +0000

On 23/03/16 15:43, Mika Kuoppala wrote:
Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxxxxxxxx> writes:

[ text/plain ]

Should have sent this as RFC..

On 23/03/16 14:32, Tvrtko Ursulin wrote:
From: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx>

When I added an assert to catch non-atomic users of
wait_for_atomic_us in 0351b93992aa463cc3e7f358ddec2709f9390756
("drm/i915: Do not lie about atomic timeout granularity"),
I have missed some callers which use it from obviously
non-atomic context.

Replace them with sleeping waits which support micro-second
timeout granularity since 3f177625ee896f5d3c62fa6a49554a9c0243bceb
("drm/i915: Add wait_for_us").

Note however than a fix for wait_for is needed to a clock with
more granularity than jiffies. In the above referenced patch
I have switched the arguments to micro-seconds, but failed to
upgrade the clock as well, as Mika has later discovered.

Open question here is whether we should allow sleeping waits
of less than 10us which usleep_range recommends against. And
this patch actually touches one call site which asks for 1us
timeout.

These might be better served with wait_for_atomic_us, in which
case the inatomic warning there should be made dependant on
the requested timeout.

For discussion - does the above sound like a better plan than this
patch? To sum up my proposal:

What I have aimed for was that we only have wait_for and wait_for_atomic.

The sleeping one operates on 1ms granularity and the nonsleeping
one on usecs.

Okay, if you think 1ms is enough for all callers.

1. Allow wait for_atomic_us for < 10us waits and keep using it for such
waiters.

I have modified the wait_for to do few busy cycles on the
start of the wait and then adaptive backoff if condition is not
yet met. In hopes that we could convert few atomic_waits for this.

Sounds good.

2. Upgrade the clock in wait_for to something more precise than jiffies
so timeouts from 10us and up can be handled properly. Note that
currently this is  only and issue in the failure/timeout mode. In the
expected case the current implementation is fine.

I would not go this route. If you really really want <1ms response
this should be explicit in the callsite. Disclaimer: i don't
know all the callsites and requirements.

It is explicit, just that it is currently broken on the timeout front. 
But never mind, if the precision is really not required then it is good 
to get rid of it.

Equally as under 1), put a BUILD_BUG_ON in wait_for for <10us waits.

This is what I had in mind (wip/rfc):
https://cgit.freedesktop.org/~miku/drm-intel/log/?h=wait_until

Spiced with your patch and few build_bug_on, I think the
wait_for_atomic(_us) might become rare thing.

I had a brief look at your tree - looks like a comprehensive approach 
and in overall a good one. Have spotted some small details, but I can 
comment on when you post it.

Biggest thing to make sure is that you don't add a lot of cycles to the 
forcewake loops since for example fw_domains_get can be the hottest i915 
function on some benchmarks.

(This area slightly annoys me anyway with redundant looping over 
forcewake domains and we could also potentially optimize the ack waiting 
by first requesting all we want, and then doing the waits. That would be 
one additional loop, but if removed the other one, code would stay at 
the same number of domain loops.)

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx