Re: [RFC PATCH 0/5] cgroup support for GPU devices

Tejun Heo <tj@xxxxxxxxxx> · Mon, 6 May 2019 08:26:43 -0700

Hello,

On Wed, May 01, 2019 at 10:04:33AM -0400, Brian Welty wrote:
> The patch series enables device drivers to use cgroups to control the
> following resources within a GPU (or other accelerator device):
> *  control allocation of device memory (reuse of memcg)
> and with future work, we could extend to:
> *  track and control share of GPU time (reuse of cpu/cpuacct)
> *  apply mask of allowed execution engines (reuse of cpusets)
> 
> Instead of introducing a new cgroup subsystem for GPU devices, a new
> framework is proposed to allow devices to register with existing cgroup
> controllers, which creates per-device cgroup_subsys_state within the
> cgroup.  This gives device drivers their own private cgroup controls
> (such as memory limits or other parameters) to be applied to device
> resources instead of host system resources.
> Device drivers (GPU or other) are then able to reuse the existing cgroup
> controls, instead of inventing similar ones.

I'm really skeptical about this approach.  When creating resource
controllers, I think what's the most important and challenging is
establishing resource model - what resources are and how they can be
distributed.  This patchset is going the other way around - building
out core infrastructure for bolierplates at a significant risk of
mixing up resource models across different types of resources.

IO controllers already implement per-device controls.  I'd suggest
following the same interface conventions and implementing a dedicated
controller for the subsystem.

Thanks.

-- 
tejun