Am 03.09.19 um 10:02 schrieb Daniel Vetter: > On Thu, Aug 29, 2019 at 02:05:17AM -0400, Kenny Ho wrote: >> This is a follow up to the RFC I made previously to introduce a cgroup >> controller for the GPU/DRM subsystem [v1,v2,v3]. The goal is to be able to >> provide resource management to GPU resources using things like container. >> >> With this RFC v4, I am hoping to have some consensus on a merge plan. I believe >> the GEM related resources (drm.buffer.*) introduced in previous RFC and, >> hopefully, the logical GPU concept (drm.lgpu.*) introduced in this RFC are >> uncontroversial and ready to move out of RFC and into a more formal review. I >> will continue to work on the memory backend resources (drm.memory.*). >> >> The cover letter from v1 is copied below for reference. >> >> [v1]: https://lists.freedesktop.org/archives/dri-devel/2018-November/197106.html >> [v2]: https://www.spinics.net/lists/cgroups/msg22074.html >> [v3]: https://lists.freedesktop.org/archives/amd-gfx/2019-June/036026.html > So looking at all this doesn't seem to have changed much, and the old > discussion didn't really conclude anywhere (aside from some details). > > One more open though that crossed my mind, having read a ton of ttm again > recently: How does this all interact with ttm global limits? I'd say the > ttm global limits is the ur-cgroups we have in drm, and not looking at > that seems kinda bad. At least my hope was to completely replace ttm globals with those limitations here when it is ready. Christian. > -Daniel > >> v4: >> Unchanged (no review needed) >> * drm.memory.*/ttm resources (Patch 9-13, I am still working on memory bandwidth >> and shrinker) >> Base on feedbacks on v3: >> * update nominclature to drmcg >> * embed per device drmcg properties into drm_device >> * split GEM buffer related commits into stats and limit >> * rename function name to align with convention >> * combined buffer accounting and check into a try_charge function >> * support buffer stats without limit enforcement >> * removed GEM buffer sharing limitation >> * updated documentations >> New features: >> * introducing logical GPU concept >> * example implementation with AMD KFD >> >> v3: >> Base on feedbacks on v2: >> * removed .help type file from v2 >> * conform to cgroup convention for default and max handling >> * conform to cgroup convention for addressing device specific limits (with major:minor) >> New function: >> * adopted memparse for memory size related attributes >> * added macro to marshall drmcgrp cftype private (DRMCG_CTF_PRIV, etc.) >> * added ttm buffer usage stats (per cgroup, for system, tt, vram.) >> * added ttm buffer usage limit (per cgroup, for vram.) >> * added per cgroup bandwidth stats and limiting (burst and average bandwidth) >> >> v2: >> * Removed the vendoring concepts >> * Add limit to total buffer allocation >> * Add limit to the maximum size of a buffer allocation >> >> v1: cover letter >> >> The purpose of this patch series is to start a discussion for a generic cgroup >> controller for the drm subsystem. The design proposed here is a very early one. >> We are hoping to engage the community as we develop the idea. >> >> >> Backgrounds >> ========== >> Control Groups/cgroup provide a mechanism for aggregating/partitioning sets of >> tasks, and all their future children, into hierarchical groups with specialized >> behaviour, such as accounting/limiting the resources which processes in a cgroup >> can access[1]. Weights, limits, protections, allocations are the main resource >> distribution models. Existing cgroup controllers includes cpu, memory, io, >> rdma, and more. cgroup is one of the foundational technologies that enables the >> popular container application deployment and management method. >> >> Direct Rendering Manager/drm contains code intended to support the needs of >> complex graphics devices. Graphics drivers in the kernel may make use of DRM >> functions to make tasks like memory management, interrupt handling and DMA >> easier, and provide a uniform interface to applications. The DRM has also >> developed beyond traditional graphics applications to support compute/GPGPU >> applications. >> >> >> Motivations >> ========= >> As GPU grow beyond the realm of desktop/workstation graphics into areas like >> data center clusters and IoT, there are increasing needs to monitor and regulate >> GPU as a resource like cpu, memory and io. >> >> Matt Roper from Intel began working on similar idea in early 2018 [2] for the >> purpose of managing GPU priority using the cgroup hierarchy. While that >> particular use case may not warrant a standalone drm cgroup controller, there >> are other use cases where having one can be useful [3]. Monitoring GPU >> resources such as VRAM and buffers, CU (compute unit [AMD's nomenclature])/EU >> (execution unit [Intel's nomenclature]), GPU job scheduling [4] can help >> sysadmins get a better understanding of the applications usage profile. Further >> usage regulations of the aforementioned resources can also help sysadmins >> optimize workload deployment on limited GPU resources. >> >> With the increased importance of machine learning, data science and other >> cloud-based applications, GPUs are already in production use in data centers >> today [5,6,7]. Existing GPU resource management is very course grain, however, >> as sysadmins are only able to distribute workload on a per-GPU basis [8]. An >> alternative is to use GPU virtualization (with or without SRIOV) but it >> generally acts on the entire GPU instead of the specific resources in a GPU. >> With a drm cgroup controller, we can enable alternate, fine-grain, sub-GPU >> resource management (in addition to what may be available via GPU >> virtualization.) >> >> In addition to production use, the DRM cgroup can also help with testing >> graphics application robustness by providing a mean to artificially limit DRM >> resources availble to the applications. >> >> >> Challenges >> ======== >> While there are common infrastructure in DRM that is shared across many vendors >> (the scheduler [4] for example), there are also aspects of DRM that are vendor >> specific. To accommodate this, we borrowed the mechanism used by the cgroup to >> handle different kinds of cgroup controller. >> >> Resources for DRM are also often device (GPU) specific instead of system >> specific and a system may contain more than one GPU. For this, we borrowed some >> of the ideas from RDMA cgroup controller. >> >> Approach >> ======= >> To experiment with the idea of a DRM cgroup, we would like to start with basic >> accounting and statistics, then continue to iterate and add regulating >> mechanisms into the driver. >> >> [1] https://www.kernel.org/doc/Documentation/cgroup-v1/cgroups.txt >> [2] https://lists.freedesktop.org/archives/intel-gfx/2018-January/153156.html >> [3] https://www.spinics.net/lists/cgroups/msg20720.html >> [4] https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/scheduler >> [5] https://kubernetes.io/docs/tasks/manage-gpus/scheduling-gpus/ >> [6] https://blog.openshift.com/gpu-accelerated-sql-queries-with-postgresql-pg-strom-in-openshift-3-10/ >> [7] https://github.com/RadeonOpenCompute/k8s-device-plugin >> [8] https://github.com/kubernetes/kubernetes/issues/52757 >> >> Kenny Ho (16): >> drm: Add drm_minor_for_each >> cgroup: Introduce cgroup for drm subsystem >> drm, cgroup: Initialize drmcg properties >> drm, cgroup: Add total GEM buffer allocation stats >> drm, cgroup: Add peak GEM buffer allocation stats >> drm, cgroup: Add GEM buffer allocation count stats >> drm, cgroup: Add total GEM buffer allocation limit >> drm, cgroup: Add peak GEM buffer allocation limit >> drm, cgroup: Add TTM buffer allocation stats >> drm, cgroup: Add TTM buffer peak usage stats >> drm, cgroup: Add per cgroup bw measure and control >> drm, cgroup: Add soft VRAM limit >> drm, cgroup: Allow more aggressive memory reclaim >> drm, cgroup: Introduce lgpu as DRM cgroup resource >> drm, cgroup: add update trigger after limit change >> drm/amdgpu: Integrate with DRM cgroup >> >> Documentation/admin-guide/cgroup-v2.rst | 163 +- >> Documentation/cgroup-v1/drm.rst | 1 + >> drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h | 4 + >> drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 29 + >> drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 6 +- >> drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 3 +- >> drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 6 + >> drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 3 + >> .../amd/amdkfd/kfd_process_queue_manager.c | 140 ++ >> drivers/gpu/drm/drm_drv.c | 26 + >> drivers/gpu/drm/drm_gem.c | 16 +- >> drivers/gpu/drm/drm_internal.h | 4 - >> drivers/gpu/drm/ttm/ttm_bo.c | 93 ++ >> drivers/gpu/drm/ttm/ttm_bo_util.c | 4 + >> include/drm/drm_cgroup.h | 122 ++ >> include/drm/drm_device.h | 7 + >> include/drm/drm_drv.h | 23 + >> include/drm/drm_gem.h | 13 +- >> include/drm/ttm/ttm_bo_api.h | 2 + >> include/drm/ttm/ttm_bo_driver.h | 10 + >> include/linux/cgroup_drm.h | 151 ++ >> include/linux/cgroup_subsys.h | 4 + >> init/Kconfig | 5 + >> kernel/cgroup/Makefile | 1 + >> kernel/cgroup/drm.c | 1367 +++++++++++++++++ >> 25 files changed, 2193 insertions(+), 10 deletions(-) >> create mode 100644 Documentation/cgroup-v1/drm.rst >> create mode 100644 include/drm/drm_cgroup.h >> create mode 100644 include/linux/cgroup_drm.h >> create mode 100644 kernel/cgroup/drm.c >> >> -- >> 2.22.0 >>