[RFC] Generic cgroup controller for the gpu/drm subsystem

Kenny Ho <y2kenny@xxxxxxxxx> · Mon, 29 Oct 2018 19:49:13 -0400

(Resending in plain text)

Hi,

We are thinking of using cgroup to manage resources in GPUs.  I
believe Matt Roper from Intel has been trying to do something similar.
>From previous discussions
(https://www.spinics.net/lists/cgroups/msg18687.html), the cgroup
framework appears to not want to have a full-fledged cgroup controller
for Matt's use case but I am not sure if I understood the rationale.
It's also unclear to me if our use case matches Matt's.  We are hoping
to have a better understanding of the situation before embarking on a
path that may ultimately be unacceptable to upstream.  To that end, I
will outline our (AMD) use case at a high level and perhaps folks on
this list can give some suggestions?

Our use case comes from the world of data center and cluster.
Currently we have a rudimentary mechanism to expose GPUs to a
container cluster running Kubernetes
(https://github.com/RadeonOpenCompute/k8s-device-plugin) but it only
exposes GPUs in whole.  That means multiple container cannot share the
same GPU.  A well established way to share a GPU is to use
SRIOV/virtualization but it shares the GPU in time slices.

An alternative is to share the GPU by its constituents.  Perhaps a
good way to think about this is to treat the GPU like a mini-computer.
A GPU has memory (VRAM) and it also has compute units (but instead 10s
of cores, it has 100~1000 of shaders/CUs.)  So we can potentially
share a GPU by those two dimensions.  Similar to a computer, a GPU
also has specialized hardware so we can potentially share those
separately as well.

Unlike a computer, however, GPUs are not as well "standardized" as a
desktop or a server.  For the gpu/drm subsystem, there are something
that are common (such as buffer sharing and buffer lifetime
management), something that are shared by some vendors (software
scheduler) and something that are very much vendor specific.  Due to
this, a generic cgroup controller for drm may need to be more
pluggable than other cgroup controller.  We took a look at the rdma
cgroup as part of our research but rdma appears to have resources that
are more abstracted and standardized.

What do you think?  Does drm/gpu warrant its own full-fledged cgroup controller?

Regards,
Kenny Ho