+dri-devel list since a lot of the relevant audience is on that list. On Mon, Oct 29, 2018 at 07:49:13PM -0400, Kenny Ho wrote: > (Resending in plain text) > > Hi, > > We are thinking of using cgroup to manage resources in GPUs. I > believe Matt Roper from Intel has been trying to do something similar. > From previous discussions > (https://www.spinics.net/lists/cgroups/msg18687.html), the cgroup > framework appears to not want to have a full-fledged cgroup controller > for Matt's use case but I am not sure if I understood the rationale. > It's also unclear to me if our use case matches Matt's. We are hoping > to have a better understanding of the situation before embarking on a > path that may ultimately be unacceptable to upstream. To that end, I > will outline our (AMD) use case at a high level and perhaps folks on > this list can give some suggestions? > > Our use case comes from the world of data center and cluster. > Currently we have a rudimentary mechanism to expose GPUs to a > container cluster running Kubernetes > (https://github.com/RadeonOpenCompute/k8s-device-plugin) but it only > exposes GPUs in whole. That means multiple container cannot share the > same GPU. A well established way to share a GPU is to use > SRIOV/virtualization but it shares the GPU in time slices. > > An alternative is to share the GPU by its constituents. Perhaps a > good way to think about this is to treat the GPU like a mini-computer. > A GPU has memory (VRAM) and it also has compute units (but instead 10s > of cores, it has 100~1000 of shaders/CUs.) So we can potentially > share a GPU by those two dimensions. Similar to a computer, a GPU > also has specialized hardware so we can potentially share those > separately as well. > > Unlike a computer, however, GPUs are not as well "standardized" as a > desktop or a server. For the gpu/drm subsystem, there are something > that are common (such as buffer sharing and buffer lifetime > management), something that are shared by some vendors (software > scheduler) and something that are very much vendor specific. Due to > this, a generic cgroup controller for drm may need to be more > pluggable than other cgroup controller. We took a look at the rdma > cgroup as part of our research but rdma appears to have resources that > are more abstracted and standardized. > > What do you think? Does drm/gpu warrant its own full-fledged cgroup controller? > > Regards, > Kenny Ho Hi Kenny. My drm+cgroups work from earlier this year has been on pause at the moment since I got pulled away to focus on some other higher priority tasks. What I was working on previously still has value to various parts of Intel, so I do plan to return to it eventually if nobody else jumps in first; I'm just not sure exactly when I'll have time to get back to it. In general, there are several areas where gpu and drm subsystem behavior could interact in some way with cgroup membership. Some aspects of graphics behavior would be a good match for controlling via a true cgroup controller, whereas others probably make more sense to add as driver or drm core interfaces that just pay attention to the cgroup membership of a process. A real cgroup controller is probably what we'd want to use for concepts that map well to the hierarchical structure of cgroups and that can be handled via one of the four models described in the "Resource Distribution Models" section of Documentation/admin-guide/cgroup-v2.rst. Off the top of my head, the graphics concepts that seem like a good match for this are: * GPU memory management - At a high level, memory management fits into cgroup controller model well, but there are a lot of implementation details that would need to be agreed upon before someone starts writing a controller for this. The way GPU memory gets allocated and shared between processes adds complexity to how you do the accounting, as does the diversity in types and levels of GPU memory supported by different vendors' GPU's (especially differences between what "GPU memory" even means on discrete vs integrated graphics). * GPU time (fair scheduler) - If you want to partition execution time on a GPU, a cgroup controller is a good match for that. * GPU engine/EU partitioning - I'm not familiar with the details of the specific hardware you're focusing on, but based on your description above, it sounds like it gives you a lot of flexibility to slice up your GPU execution units and submit independent workloads to arbitrary subsets of them? If that's true, a cgroup controller could be used to balance how many EU's various cgroups have access to or to reserve dedicated subsets of EU's for the processes in specific cgroups to help provide QoS guarantees. I don't think most of the hardware I work with is nearly that flexible at the EU level, but even on simpler hardware designs, a cgroup controller could probably partition access to the higher-level execution engines (e.g., it would be possible to specify that processes from a specific part of the cgroup hierarchy are the only ones with any access to the media engine). On the other hand, some graphics concepts don't really care about the overall cgroup hierarchy, but would like to make decisions based on traits that have been assigned to the specific, individual cgroup a process belongs to. The specific use case I was working on before was an example of this --- GPU priority in a system with a strictly priority-based (non-fair, starvation allowed) scheduler. While GPU priority shares some similarity with the "GPU time" example I gave above, priority itself isn't a resource that gets distributed the same way that "GPU time" is. The priority for any individual cgroup is completely unrelated to the priority for any other cgroup, and the cgroup's position in the hierarchy isn't interesting. The consensus when we discussed this before was that concepts like GPU priority (which are more just about tagging groups of processes with a setting/value) are better handled in the DRM subsystem itself. Matt -- Matt Roper Graphics Software Engineer IoTG Platform Enabling & Development Intel Corporation (916) 356-2795