cgroups are core kernel mechanism that allows a system integrator / system administrator to collect OS processes into a hierarchy of groups according to their intended role in the overall system; resource management and policy configuration can then be applied to each cgroup independently. The DRM subsystem manages several concepts that would be a good match for cgroup-level configuration. This series adds infrastructure to allow DRM drivers to track 'parameters' associated with individual cgroups. These parameters can be used to manage things like GPU priority, discrete/stolen memory limits, etc.; drivers will be able to query the parameters set for a process' cgroup and then apply appropriate driver-level policy. The series is organized as follows: * Patches 1-4 export some additional interfaces from the cgroup core kernel implementation to make them accessible to modules and drivers. * Patch 5 introduces a new DRM ioctl that allows userspace to set parameter values for specific cgroups. * Patch 6 introduces a DRM helper library to simplify the management of allocation/storage/fetching of per-cgroup driver-specific data. * Patch 7 adds a helper function to obtain the v2 cgroup of the process associated with a drm_file. * Patch 8 implements support for GPU priority as a cgroup parameter in the i915 driver. * Patch 9 adds context priority to i915's debugfs output to make it easier to verify that context priorities are being initialized as expected. Anticipated questions / concerns -------------------------------- Q: What's the userspace consumer of this? A: I'll send a follow-up to the dri-devel / intel-gfx mailing lists with a small patch that adds a simple command line tool to the libdrm tests directory. Although it looks more like a simple test program than a real consumer, I think it's about the only userspace we'll ever want/need. Keep in mind that the real "consumers" here aren't the graphics applications themselves, but rather the system startup process (e.g., a sysv-init script or systemd service). The startup scripts can shuffle the various services/programs into appropriate cgroups and then make some calls like: drm_set_cgrp_param /dev/dri/card0 /cgroup2/safety_critical/ 1 900 drm_set_cgrp_param /dev/dri/card0 /cgroup2/high_priority/ 1 100 drm_set_cgrp_param /dev/dri/card0 /cgroup2/best_effort/ 1 -200 to define the priority policy for each cgroup. Aside from initial startup scripts, none of the actual graphics clients are expected to touch this interface. Q: The initial use case here is for setting i915 GPU priority according to cgroup. How/why does this differ from existing priority mechanisms (e.g., setting I915_CONTEXT_PARAM_PRIORITY via the I915_GEM_CONTEXT_SETPARAM ioctl on individual GPU contexts)? A: Existing mechanisms like the i915 context priority parameter will ultimately be called by the software that priority is being assigned for (e.g., a 3D application might use EGL_IMG_context_priority to self-classify as high priority or low priority). However the priority of an application usually isn't a characteristic of an application itself, but rather a decision that an admin/integrator makes from a system-level perspective. cgroups provide a standard, convenient mechanism for a system integrator to apply the specific policy he needs to build a cohesive system. Note that the cgroups support for i915 priority here just assigns the initial/default priority for GPU contexts and doesn't block runtime adjustment of the priority via other mechanisms. Q: Do we really anticipate other DRM concepts (beyond GPU priority) being a reasonable match for cgroups-style management/control? A: I think there's a lot of potential to use cgroups to manage limits on various types of "graphics memory" in the future. That could either be things like stolen memory on i915 (granted, we don't allow direct allocations of this from userspace today, but it's been talked about in the past) or discrete video RAM on systems that have that. Q: Why is this implemented via DRM ioctl rather than as a cgroup controller which would expose settings via kernfs nodes? A: The kernel has a concept of 'cgroup controllers' for exposing settings via virtual filesystem nodes. My initial thought was to expose this kind of functionality as a driver-level cgroup controller so that, for example, virtual files like "i915.priority" would appear in each cgroup folder and be readable/writable directly. However as of commit ("3ed80a6 ("cgroup: drop module support")), it's now required that controllers be built directly into the kernel; they can no longer be provided by modules. There was some discussion about this direction at the time here: https://www.spinics.net/lists/cgroups/msg10077.html and we discussed it recently again on the cgroups mailing list here: https://www.spinics.net/lists/cgroups/msg18672.html The way I see it, usage of cgroups can pretty much be broken down into two categories: (a) distribution/management of a limited resource across a hierarchy of processes, and (b) general policy/configuration setting for groups of processes. The cgroup controller concept is really designed for category (a) above, and a lot of work is done to take the cgroup hierarchy itself into account, not just the details of the final leaf node. In contrast, my initial use of cgroups for DRM drivers (i915 GPU priority) falls into the second category --- we're managing the GPU priority that the scheduler makes use of rather than share of GPU time. The solution I've taken here (driver/subsystem call that takes a cgroup as a parameter and manages data locally) is closer in design to some other areas of the kernel (like the BPF_PROG_ATTACH command accepted by the bpf() system call). It's possible that if/when we do start looking to cgroups for graphics-specific memory management we will want to consider using a true cgroup controller for that type of management (since it will fit more into category (a) above as a true resource controller). That will probably be some serious work to resurrect module-based controller support in the cgroup subsystem, so I'll leave that until we have a definite use case that needs it. For simpler policy (like GPU priority), the approach here is probably a better direction forward. Of course this is an initial RFC, so feedback welcome! Q: Why does the DRM cgroup support here restrict itself to the cgroup-v2 hierarchy? Why not allow DRM parameters to be set on all the cgroup-v1 hierarchies my distro has? A: cgroups has two ABI's (a multi-hierarchy cgroup-v1 and a single hierarchy cgroup-v2). Both can co-exist and be used simultaneously on a system, but cgroups-v1 is really for backward compatility, and cgroups-v2 is supposed to be the way of the future. I restricted the support here to v2 mostly so that we wouldn't be building on a legacy framework, but also because the multi-hierarchy nature of v1 cgroups adds some extra complexity. When creating a new GPU context, how would you decide which hierarchy to try to lookup priority in? What if a process had different priority values set on its cgroups in different hierarchies? It's easiest to just avoid the confusion by sticking with the single v2 hierarchy. Q: The patches here add support for "i915 priority." Should we simplify this to a more general "GPU priority" that isn't driver-specific or device-specific? A: I opted for a device-specific approach here for a few reasons. First, it doesn't seem unreasonable to have a multi-GPU system where groups have different priorities for each GPU they can submit workloads to. Second, we already have multiple scheduler implementations in the DRM tree (e.g., the shared "DRM scheduler" contributed by AMD and the Intel i915 scheduler). These schedulers have different priority ranges and expectations so it might be confusing to try to map any general purpose "GPU priority" range into the specifc range used by an individual scheduler, especially when driver-specific interfaces would then have the ability to alter the priority further via driver-specific interfaces. Q: Given the justification above, is "i915 priority" too high-level? Should we allow priority to be set independently for different engines within a single GPU (e.g., render prio != blit prio != video prio)? A: Maybe? I'm open to feedback on this one. If we decide to stick with a single i915 priority for now, we can always add per-engine priority parameters in the future and update the code so that the existing parameter (I915_CGRP_DEF_CONTEXT_PRIORITY) simply sets the priority for all engines to the same value at once. Q: What is the access control on this ioctl? Who/what is allowed to set cgroup parameters? A: I've tied the access to this ioctl to filesystem permissions on the cgroup kernfs directory. If a process has write access on the directory (meaning it can make other types of cgroup modifications), then it can update cgroup parameters via the ioctl. I think this is the most sensible way to handle access permission, but alternate suggestions are welcome. TODO ---- - Add some i-g-t tests to exercise the ioctl interface, especially interaction with various cgroup operations (e.g., set parameter for a cgroup, then rmdir the cgroup directory) - Documentation: the new code here has a lot of kerneldoc embedded in it, but none of that is actually integrated into the rst files in the Documentation/gpu directory yet. Matt Roper (9): kernfs: Export kernfs_get_inode cgroup: Add notifier call chain for cgroup destruction cgroup: Export cgroup_on_dfl() to drivers cgroup: Export task_cgroup_from_root() and cgroup_mutex for drivers drm: Introduce DRM_IOCTL_CGROUP_SETPARAM drm: Add cgroup helper library drm: Add helper to obtain cgroup of drm_file's owning process drm/i915: Allow default context priority to be set via cgroup parameter drm/i915: Add context priority to debugfs drivers/gpu/drm/Makefile | 2 + drivers/gpu/drm/drm_cgroup.c | 120 ++++++++++++++++ drivers/gpu/drm/drm_cgroup_helper.c | 244 ++++++++++++++++++++++++++++++++ drivers/gpu/drm/drm_ioctl.c | 5 + drivers/gpu/drm/i915/Makefile | 1 + drivers/gpu/drm/i915/i915_cgroups.c | 162 +++++++++++++++++++++ drivers/gpu/drm/i915/i915_debugfs.c | 2 + drivers/gpu/drm/i915/i915_drv.c | 4 + drivers/gpu/drm/i915/i915_drv.h | 32 +++++ drivers/gpu/drm/i915/i915_gem_context.c | 2 +- fs/kernfs/inode.c | 1 + include/drm/drm_cgroup.h | 38 +++++ include/drm/drm_cgroup_helper.h | 153 ++++++++++++++++++++ include/drm/drm_device.h | 13 ++ include/drm/drm_file.h | 28 ++++ include/linux/cgroup.h | 10 +- include/uapi/drm/drm.h | 10 ++ include/uapi/drm/i915_drm.h | 9 ++ kernel/cgroup/cgroup-internal.h | 4 - kernel/cgroup/cgroup.c | 27 +++- 20 files changed, 858 insertions(+), 9 deletions(-) create mode 100644 drivers/gpu/drm/drm_cgroup.c create mode 100644 drivers/gpu/drm/drm_cgroup_helper.c create mode 100644 drivers/gpu/drm/i915/i915_cgroups.c create mode 100644 include/drm/drm_cgroup.h create mode 100644 include/drm/drm_cgroup_helper.h -- 2.14.3 -- To unsubscribe from this list: send the line "unsubscribe cgroups" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html