Re: [PATCH 16/17] cgroup/drm: Expose memory stats

Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxxxxxxxx> · Fri, 28 Jul 2023 15:15:29 +0100

One additional thought on one sub-topic:

On 27/07/2023 18:08, Tvrtko Ursulin wrote:

[snip]

For something like this,  you would probably want it to work inside 
the drm scheduler first. Presumably, this can be done by setting a 
weight on each runqueue, and perhaps adding a callback to update one 
for a running queue. Calculating the weights hierarchically might be 
fun..

It is not needed to work in drm scheduler first. In fact drm 
scheduler based drivers can plug into what I have since it already 
has the notion of scheduling priorities.

They would only need to implement a hook which allow the cgroup 
controller to query client GPU utilisation and another to received 
the over budget signal.

Amdgpu and msm AFAIK could be easy candidates because they both 
support per client utilisation and priorities.

Looks like I need to put all this info back into the cover letter.

Also, hierarchic weights and time budgets are all already there. What 
could be done later is make this all smarter and respect the time 
budget with more precision. That would however, in many cases 
including Intel, require co-operation with the firmware. In any case 
it is only work in the implementation, while the cgroup control 
interface remains the same.

I have taken a look at how the rest of cgroup controllers change 
ownership when moved to a different cgroup, and the answer was: not 
at all. If we attempt to create the scheduler controls only on the 
first time the fd is used, you could probably get rid of all the 
tracking.

Can you send a CPU file descriptor from process A to process B and 
have CPU usage belonging to process B show up in process' A cgroup, 
or vice-versa? Nope, I am not making any sense, am I? My point being 
it is not like-to-like, model is different.

No ownership transfer would mean in wide deployments all GPU 
utilisation would be assigned to Xorg and so there is no point to any 
of this. No way to throttle a cgroup with un-important GPU clients 
for instance.
If you just grab the current process' cgroup when a drm_sched_entity 
is created, you don't have everything charged to X.org. No need for 
complicated ownership tracking in drm_file. The same equivalent should 
be done in i915 as well when a context is created as it's not using 
the drm scheduler.

Okay so essentially nuking the concept of DRM clients belongs to one 
cgroup and instead tracking at the context level. That is an interesting 
idea. I suspect implementation could require somewhat generalizing the 
concept of an "execution context", or at least expressing it via the DRM 
cgroup controller.

I can give this a spin, or at least some more detailed thought, once we 
close on a few more details regarding charging in general.

I didn't get much time to brainstorm this just yet, only one downside 
randomly came to mind later - with this approach for i915 we wouldn't 
correctly attribute any GPU activity done in the receiving process 
against our default contexts. Those would still be accounted to the 
sending process.

How much problem in practice that would be remains to be investigated, 
including if it applies to other drivers too. If there is a good amount 
of deployed userspace which use the default context, then it would be a 
bit messy.

Regards,

Tvrtko

*) For non DRM and non i915 people, default context is a GPU submission 
context implicitly created during the device node open. It always 
remains valid, including in the receiving process if SCM_RIGHTS is used.