(Resending in plain text) Hi, We are thinking of using cgroup to manage resources in GPUs. I believe Matt Roper from Intel has been trying to do something similar. >From previous discussions (https://www.spinics.net/lists/cgroups/msg18687.html), the cgroup framework appears to not want to have a full-fledged cgroup controller for Matt's use case but I am not sure if I understood the rationale. It's also unclear to me if our use case matches Matt's. We are hoping to have a better understanding of the situation before embarking on a path that may ultimately be unacceptable to upstream. To that end, I will outline our (AMD) use case at a high level and perhaps folks on this list can give some suggestions? Our use case comes from the world of data center and cluster. Currently we have a rudimentary mechanism to expose GPUs to a container cluster running Kubernetes (https://github.com/RadeonOpenCompute/k8s-device-plugin) but it only exposes GPUs in whole. That means multiple container cannot share the same GPU. A well established way to share a GPU is to use SRIOV/virtualization but it shares the GPU in time slices. An alternative is to share the GPU by its constituents. Perhaps a good way to think about this is to treat the GPU like a mini-computer. A GPU has memory (VRAM) and it also has compute units (but instead 10s of cores, it has 100~1000 of shaders/CUs.) So we can potentially share a GPU by those two dimensions. Similar to a computer, a GPU also has specialized hardware so we can potentially share those separately as well. Unlike a computer, however, GPUs are not as well "standardized" as a desktop or a server. For the gpu/drm subsystem, there are something that are common (such as buffer sharing and buffer lifetime management), something that are shared by some vendors (software scheduler) and something that are very much vendor specific. Due to this, a generic cgroup controller for drm may need to be more pluggable than other cgroup controller. We took a look at the rdma cgroup as part of our research but rdma appears to have resources that are more abstracted and standardized. What do you think? Does drm/gpu warrant its own full-fledged cgroup controller? Regards, Kenny Ho