Re: [RFC PATCH 00/18] TTM interface for managing VRAM oversubscription

Friedrich Vock <friedrich.vock@xxxxxx> · Mon, 13 May 2024 15:44:58 +0200

Hi,

On 02.05.24 16:23, Maarten Lankhorst wrote:
Hey,

[snip]

For Xe, I've been loking at using cgroups. A small prototype is
available at

https://cgit.freedesktop.org/~mlankhorst/linux/log/?h=dumpcg

To stimulate discussion, I've added amdgpu support as well.
This should make it possible to isolate the compositor allocations
from the target program.

This support is still incomplete and covers vram only, but I need help
from userspace and consensus from other drivers on how to move forward.

I'm thinking of making 3 cgroup limits:
1. Physical memory, each time a buffer is allocated, it counts towards
it, regardless where it resides.
2. Mappable memory, all buffers allocated in sysmem or vram count
towards this limit.
3. VRAM, only buffers residing in VRAM count here.

This ensures that VRAM can always be evicted to sysmem, by having a
mappable memory quota, and having a sysmem reservation.

The main trouble is that when evicting, you want to charge the
original process the changes in allocation limits, but it should be
solvable.

I've been looking for someone else needing the usecase in a different
context, so let me know what you think of the idea.

Sorry for the late reply. The idea sounds really good! I think cgroups
are great fit for what we'd need to prioritize game+compositor over
other potential non-foreground apps.

From what I can tell looking through the code, the current cgroup
properties are absolute memory sizes that userspace asks the kernel to
restrict the cgroup usage to?
While that sounds useful for some usecases too, I'm not sure just these
limits are a good solution for making sure that your compositor's and
foreground app's resources stay in memory (in favor of background apps)
when there is pressure.

This can be generalized towards all uses of the GPU, but the
compositor vs game thrashing is a good example of why it is useful to
have.

IIRC Tvrtko's original proposal was about per-cgroup DRM scheduling
priorities providing lower submission latency for prioritized cgroups,
right?

I think what we need here would pretty much exactly such a priority
system, but for memory: The cgroup containing the foreground app/game
and the compositor should have some hint telling TTM to try its hardest
to avoid evicting its buffers (i.e. a high memory priority).
Your existing drm_cgroup work looks like a great base for this, and I'd
be happy to help/participate with the implementation for amdgpu.

Thanks,
Friedrich

I should still have my cgroup testcase somewhere, this is only a
rebase of my previous proposal, but I think it fits the usecase.

Cheers,
Maarten