Hi, On 02.05.24 16:23, Maarten Lankhorst wrote:
Hey, [snip] For Xe, I've been loking at using cgroups. A small prototype is available at https://cgit.freedesktop.org/~mlankhorst/linux/log/?h=dumpcg To stimulate discussion, I've added amdgpu support as well. This should make it possible to isolate the compositor allocations from the target program. This support is still incomplete and covers vram only, but I need help from userspace and consensus from other drivers on how to move forward. I'm thinking of making 3 cgroup limits: 1. Physical memory, each time a buffer is allocated, it counts towards it, regardless where it resides. 2. Mappable memory, all buffers allocated in sysmem or vram count towards this limit. 3. VRAM, only buffers residing in VRAM count here. This ensures that VRAM can always be evicted to sysmem, by having a mappable memory quota, and having a sysmem reservation. The main trouble is that when evicting, you want to charge the original process the changes in allocation limits, but it should be solvable. I've been looking for someone else needing the usecase in a different context, so let me know what you think of the idea.
Sorry for the late reply. The idea sounds really good! I think cgroups are great fit for what we'd need to prioritize game+compositor over other potential non-foreground apps. From what I can tell looking through the code, the current cgroup properties are absolute memory sizes that userspace asks the kernel to restrict the cgroup usage to? While that sounds useful for some usecases too, I'm not sure just these limits are a good solution for making sure that your compositor's and foreground app's resources stay in memory (in favor of background apps) when there is pressure.
This can be generalized towards all uses of the GPU, but the compositor vs game thrashing is a good example of why it is useful to have.
IIRC Tvrtko's original proposal was about per-cgroup DRM scheduling priorities providing lower submission latency for prioritized cgroups, right? I think what we need here would pretty much exactly such a priority system, but for memory: The cgroup containing the foreground app/game and the compositor should have some hint telling TTM to try its hardest to avoid evicting its buffers (i.e. a high memory priority). Your existing drm_cgroup work looks like a great base for this, and I'd be happy to help/participate with the implementation for amdgpu. Thanks, Friedrich
I should still have my cgroup testcase somewhere, this is only a rebase of my previous proposal, but I think it fits the usecase. Cheers, Maarten