Re: [RFC v3 1/8] gpu: rfc: Proposal for a GPU cgroup controller

Michal Koutný <mkoutny@xxxxxxxx> · Wed, 23 Mar 2022 11:40:20 +0100

On Tue, Mar 22, 2022 at 08:41:55AM -0700, "T.J. Mercier" <tjmercier@xxxxxxxxxx> wrote:
> So "total" is used twice here in two different contexts.
> The first one is the global "GPU" cgroup context. As in any buffer
> that any exporter claims is a GPU buffer, regardless of where/how it
> is allocated. So this refers to the sum of all gpu buffers of any
> type/source. An exporter contributes to this total by registering a
> corresponding gpucg_device and making charges against that device when
> it exports.
> The second one is in a per device context. This allows us to make a
> distinction between different types of GPU memory based on who
> exported the buffer. A single process can make use of several
> different types of dma buffers (for example cached and uncached
> versions of the same type of memory), and it would be useful to have
> different limits for each. These are distinguished by the device name
> string chosen when the gpucg_device is first registered.

So is this understanding correct?

(if there was an analogous line in gpu.memory.current to gpu.memory.max)
	$ cat gpu.memory.current
	total T
	dev1  d1
	...
	devN  dn

T = Σ di + RAM_backed_buffers

and that some of RAM_backed_buffers may be accounted also in
memory.current (case by case, depending on allocator).

Thanks,
Michal