Re: [RFC v3 1/8] gpu: rfc: Proposal for a GPU cgroup controller

"T.J. Mercier" <tjmercier@xxxxxxxxxx> · Tue, 22 Mar 2022 08:41:55 -0700

On Mon, Mar 21, 2022 at 10:37 AM Michal Koutný <mkoutny@xxxxxxxx> wrote:
>
> Hello.
>
> On Wed, Mar 09, 2022 at 04:52:11PM +0000, "T.J. Mercier" <tjmercier@xxxxxxxxxx> wrote:
> > +The new cgroup controller would:
> > +
> > +* Allow setting per-cgroup limits on the total size of buffers charged to it.
>
> What is the meaning of the total? (I only have very naïve
> understanding of the device buffers.)

So "total" is used twice here in two different contexts.
The first one is the global "GPU" cgroup context. As in any buffer
that any exporter claims is a GPU buffer, regardless of where/how it
is allocated. So this refers to the sum of all gpu buffers of any
type/source. An exporter contributes to this total by registering a
corresponding gpucg_device and making charges against that device when
it exports.
The second one is in a per device context. This allows us to make a
distinction between different types of GPU memory based on who
exported the buffer. A single process can make use of several
different types of dma buffers (for example cached and uncached
versions of the same type of memory), and it would be useful to have
different limits for each. These are distinguished by the device name
string chosen when the gpucg_device is first registered.

>
> Is it like a) there's global pool of memory that is partitioned among
> individual devices or b) each device has its own specific type of memory
> and adding across two devices is adding apples and oranges or c) there
> can be various devices both of a) and b) type?

So I guess the most correct answer to this question is c.

>
> (Apologies not replying to previous versions and possibly missing
> anything.)
>
> Thanks,
> Michal
>