On 21/07/2023 23:21, Tejun Heo wrote:
On Wed, Jul 12, 2023 at 12:46:04PM +0100, Tvrtko Ursulin wrote:
$ cat drm.memory.stat
card0 region=system total=12898304 shared=0 active=0 resident=12111872 purgeable=167936
card0 region=stolen-system total=0 shared=0 active=0 resident=0 purgeable=0
Data is generated on demand for simplicty of implementation ie. no running
totals are kept or accounted during migrations and such. Various
optimisations such as cheaper collection of data are possible but
deliberately left out for now.
Overall, the feature is deemed to be useful to container orchestration
software (and manual management).
Limits, either soft or hard, are not envisaged to be implemented on top of
this approach due on demand nature of collecting the stats.
So, yeah, if you want to add memory controls, we better think through how
the fd ownership migration should work.
It would be quite easy to make the implicit migration fail - just the
matter of failing the first ioctl, which is what triggers the migration,
after the file descriptor access from a new owner.
But I don't think I can really add that in the RFC given I have no hard
controls or anything like that.
With GPU usage throttling it doesn't really apply, at least I don't
think it does, since even when migrated to a lower budget group it would
just get immediately de-prioritized.
I don't think hard GPU time limits are feasible in general, and while
soft might be, again I don't see that any limiting would necessarily
have to run immediately on implicit migration.
Second part of the story are hypothetical/future memory controls.
I think first thing to say is that implicit migration is important, but
it is not really established to use the file descriptor from two places
or to migrate more than once. It is simply fresh fd which gets sent to
clients from Xorg, which is one of the legacy ways of doing things.
So we probably can just ignore that given no significant amount of
memory ownership would be getting migrated.
And for drm.memory.stat I think what I have is good enough - both
private and shared data get accounted, for any clients that have handles
to particular buffers.
Maarten was working on memory controls so maybe he would have more
thoughts on memory ownership and implicit migration.
But I don't think there is anything incompatible with that and
drm.memory.stats as proposed here, given how the categories reported are
the established ones from the DRM fdinfo spec, and it is fact of the
matter that we can have multiple memory regions per driver.
The main thing that would change between this RFC and future memory
controls in the area of drm.memory.stat is the implementation - it would
have to get changed under the hood from "collect on query" to "account
at allocation/free/etc". But that is just implementation details.
Regards,
Tvrtko