Re: [PATCH RFC v4 00/16] new cgroup controller for gpu/drm subsystem

Tejun Heo <tj@xxxxxxxxxx> · Fri, 6 Sep 2019 08:45:39 -0700

Hello, Daniel.

On Fri, Sep 06, 2019 at 05:34:16PM +0200, Daniel Vetter wrote:
> > Hmm... what'd be the fundamental difference from slab or socket memory
> > which are handled through memcg?  Is system memory used by GPUs have
> > further global restrictions in addition to the amount of physical
> > memory used?
> 
> Sometimes, but that would be specific resources (kinda like vram),
> e.g. CMA regions used by a gpu. But probably not something you'll run
> in a datacenter and want cgroups for ...
> 
> I guess we could try to integrate with the memcg group controller. One
> trouble is that aside from i915 most gpu drivers do not really have a
> full shrinker, so not sure how that would all integrate.

So, while it'd great to have shrinkers in the longer term, it's not a
strict requirement to be accounted in memcg.  It already accounts a
lot of memory which isn't reclaimable (a lot of slabs and socket
buffer).

> The overall gpu memory controller would still be outside of memcg I
> think, since that would include swapped-out gpu objects, and stuff in
> special memory regions like vram.

Yeah, for resources which are on the GPU itself or hard limitations
arising from it.  In general, we wanna make cgroup controllers control
something real and concrete as in physical resources.

> > At the system level, it just gets folded into cpu time, which isn't
> > perfect but is usually a good enough approximation of compute related
> > dynamic resources.  Can gpu do someting similar or at least start with
> > that?
> 
> So generally there's a pile of engines, often of different type (e.g.
> amd hw has an entire pile of copy engines), with some ill-defined
> sharing charateristics for some (often compute/render engines use the
> same shader cores underneath), kinda like hyperthreading. So at that
> detail it's all extremely hw specific, and probably too hard to
> control in a useful way for users. And I'm not sure we can really do a
> reasonable knob for overall gpu usage, e.g. if we include all the copy
> engines, but the workloads are only running on compute engines, then
> you might only get 10% overall utilization by engine-time. While the
> shaders (which is most of the chip area/power consumption) are
> actually at 100%. On top, with many userspace apis those engines are
> an internal implementation detail of a more abstract gpu device (e.g.
> opengl), but with others, this is all fully exposed (like vulkan).
> 
> Plus the kernel needs to use at least copy engines for vram management
> itself, and you really can't take that away. Although Kenny here has
> some proposal for a separate cgroup resource just for that.
> 
> I just think it's all a bit too ill-defined, and we might be better
> off nailing the memory side first and get some real world experience
> on this stuff. For context, there's not even a cross-driver standard
> for how priorities are handled, that's all driver-specific interfaces.

I see.  Yeah, figuring it out as this develops makes sense to me.  One
thing I wanna raise is that in general we don't want to expose device
or implementation details in cgroup interface.  What we want expressed
there is the intentions of the user.  The more internal details we
expose the more we end up getting tied down to the specific
implementation which we should avoid especially given the early stage
of development.

Thanks.

-- 
tejun