Re: [RFC PATCH 0/4] uapi, drm: Add and implement RLIMIT_GPUPRIO

Daniel Vetter <daniel@xxxxxxxx> · Wed, 5 Apr 2023 10:28:45 +0200

On Tue, 4 Apr 2023 at 12:45, Tvrtko Ursulin
<tvrtko.ursulin@xxxxxxxxxxxxxxx> wrote:
>
>
> Hi,
>
> On 03/04/2023 20:40, Joshua Ashton wrote:
> > Hello all!
> >
> > I would like to propose a new API for allowing processes to control
> > the priority of GPU queues similar to RLIMIT_NICE/RLIMIT_RTPRIO.
> >
> > The main reason for this is for compositors such as Gamescope and
> > SteamVR vrcompositor to be able to create realtime async compute
> > queues on AMD without the need of CAP_SYS_NICE.
> >
> > The current situation is bad for a few reasons, one being that in order
> > to setcap the executable, typically one must run as root which involves
> > a pretty high privelage escalation in order to achieve one
> > small feat, a realtime async compute queue queue for VR or a compositor.
> > The executable cannot be setcap'ed inside a
> > container nor can the setcap'ed executable be run in a container with
> > NO_NEW_PRIVS.
> >
> > I go into more detail in the description in
> > `uapi: Add RLIMIT_GPUPRIO`.
> >
> > My initial proposal here is to add a new RLIMIT, `RLIMIT_GPUPRIO`,
> > which seems to make most initial sense to me to solve the problem.
> >
> > I am definitely not set that this is the best formulation however
> > or if this should be linked to DRM (in terms of it's scheduler
> > priority enum/definitions) in any way and and would really like other
> > people's opinions across the stack on this.
> >
> > Once initial concern is that potentially this RLIMIT could out-live
> > the lifespan of DRM. It sounds crazy saying it right now, something
> > that definitely popped into my mind when touching `resource.h`. :-)
> >
> > Anyway, please let me know what you think!
> > Definitely open to any feedback and advice you may have. :D
>
> Interesting! I tried to solved the similar problem two times in the past already.
>
> First time I was proposing to tie nice to DRM scheduling priority [1] - if the latter has been left at default - drawing the analogy with the nice+ionice handling. That was rejected and I was nudged towards the cgroups route.
>
> So with that second attempt I implemented a hierarchical opaque drm.priority cgroup controller [2]. I think it would allow you to solve your use case too by placing your compositor in a cgroup with an elevated priority level.
>
> Implementation wise in my proposal it was left to individual drivers to "meld" the opaque cgroup drm.priority with the driver specific priority concept.
>
> That too wasn't too popular with the feedback (AFAIR) that the priority is a too subsystem specific concept.
>
> Finally I was left with a weight based drm cgroup controller, exactly following the controls of the CPU and IO ones, but with much looser runtime guarantees. [3]
>
> I don't think this last one works for your use case, at least not at the current state for drm scheduling capability, where the implementation is a "bit" too reactive for realtime.
>
> Depending on how the discussion around your rlimit proposal goes, perhaps one alternative could be to go the cgroup route and add an attribute like drm.realtime. That perhaps sounds abstract and generic enough to be passable. Built as a simplification of [2] it wouldn't be too complicated.
>
> On the actual proposal of RLIMIT_GPUPRIO...
>
> The name would be problematic since we have generic hw accelerators (not just GPUs) under the DRM subsystem. Perhaps RLIMIT_DRMPRIO would be better but I think you will need to copy some more mailing lists and people on that one. Because I can imagine one or two more fundamental questions this opens up, as you have eluded in your cover letter as well.

So I don't want to get into the bikeshed, I think Tvrtko summarized
pretty well that this is a hard problem with lots of attempts (I think
some more from amd too). I think what we need are two pieces here
really:
- A solid summary of all the previous attempts from everyone in this
space of trying to manage gpu compute resources (all the various
cgroup attempts, sched priority), listening the pros/cons. There's
also the fdinfo stuff just for reporting gpu usage which blew up kinda
badly and didn't have much discussion among all the stakeholders.
- Everyone on cc who's doing new drivers using drm/sched (which I
think is everyone really, or using that currently. So that's like
etnaviv, lima, amd, intel with the new xe, probably new nouveau driver
too, amd ofc, panfrost, asahi. Please cc everyone.

Unless we do have some actual rough consens in this space across all
stakeholders I think all we'll achieve is just yet another rfc that
goes nowhere. Or maybe something like the minimal fdinfo stuff
(minimal I guess to avoid wider discussion) which then blew up because
it wasn't thought out well enough.

Adding at least some of the people who probably should be cc'ed on
this. Please add more.

Cheers, Daniel

>
> Regards,
>
> Tvrtko
>
> [1] https://lore.kernel.org/dri-devel/20220407152806.3387898-1-tvrtko.ursulin@xxxxxxxxxxxxxxx/T/
> [2] https://lore.kernel.org/lkml/20221019173254.3361334-4-tvrtko.ursulin@xxxxxxxxxxxxxxx/T/#u
> [3] https://lore.kernel.org/lkml/20230314141904.1210824-1-tvrtko.ursulin@xxxxxxxxxxxxxxx/

--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch