Re: [Intel-gfx] [RFC v2 0/5] Waitboost drm syncobj waits

Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxxxxxxxx> · Mon, 20 Feb 2023 16:51:42 +0000

On 20/02/2023 16:44, Tvrtko Ursulin wrote:
On 20/02/2023 15:52, Rob Clark wrote:
On Mon, Feb 20, 2023 at 3:33 AM Tvrtko Ursulin
<tvrtko.ursulin@xxxxxxxxxxxxxxx> wrote:

On 17/02/2023 20:45, Rodrigo Vivi wrote:
[snip]

Yeah I agree. And as not all media use cases are the same, as are not
all compute contexts someone somewhere will need to run a series of
workloads for power and performance numbers. Ideally that someone would
be the entity for which it makes sense to look at all use cases, from
server room to client, 3d, media and compute for both. If we could get
the capability to run this in some automated fashion, akin to CI, we
would even have a chance to keep making good decisions in the future.

Or we do some one off testing for this instance, but we still need a
range of workloads and parts to do it properly..

I also think the "arms race" scenario isn't really as much of a
problem as you think.  There aren't _that_ many things using the GPU
at the same time (compared to # of things using CPU).   And a lot of
mobile games throttle framerate to avoid draining your battery too
quickly (after all, if your battery is dead you can't keep buying loot
boxes or whatever).
Very good point.
On this one I still disagree from the point of view that it does not
make it good uapi if we allow everyone to select themselves for priority
handling (one flavour or the other).
There is plenty of precedent for userspace giving hints to the kernel
about scheduling and freq mgmt.  Like schedutil uclamp stuff.
Although I think that is all based on cgroups.
I knew about SCHED_DEADLINE and that it requires CAP_SYS_NICE, but I did 
not know about uclamp. Quick experiment with uclampset suggests it 
indeed does not require elevated privilege. If that is indeed so, it is 
good enough for me as a precedent.
It appears to work using sched_setscheduler so maybe could define 
something similar in i915/xe, per context or per client, not sure.
Maybe it would start as a primitive implementation but the uapi would 
not preclude making it smart(er) afterwards. Or passing along to GuC to 
do it's thing with it.
Hmmm having said that, how would we fix clvk performance using that? We 
would either need the library to do a new step when creating contexts, 
or allow external control so outside entity can do it. And then the 
question is based on what it decides to do it? Is it possible to know 
which, for instance, Chrome tab will be (or is) using clvk so that tab 
management code does it?
Regards,

Tvrtko

In the fence/syncobj case, I think we need per-wait hints.. because
for a single process the driver will be doing both housekeeping waits
and potentially urgent waits.  There may also be some room for some
cgroup or similar knobs to control things like what max priority an
app can ask for, and whether or how aggressively the kernel responds
to the "deadline" hints.  So as far as "arms race", I don't think I'd
Per wait hints are okay I guess even with "I am important" in their name 
if sched_setscheduler allows raising uclamp.min just like that. In which 
case cgroup limits to mimick cpu uclamp also make sense.
change anything about my "fence deadline" proposal.. but that it might
just be one piece of the overall puzzle.
That SCHED_DEADLINE requires CAP_SYS_NICE does not worry you?

Regards,

Tvrtko