On Thu, Apr 13, 2023 at 9:40 AM Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxxxxxxxx> wrote: > > > On 13/04/2023 14:27, Daniel Vetter wrote: > > On Thu, Apr 13, 2023 at 01:58:34PM +0100, Tvrtko Ursulin wrote: > >> > >> On 12/04/2023 20:18, Daniel Vetter wrote: > >>> On Wed, Apr 12, 2023 at 11:42:07AM -0700, Rob Clark wrote: > >>>> On Wed, Apr 12, 2023 at 11:17 AM Daniel Vetter <daniel@xxxxxxxx> wrote: > >>>>> > >>>>> On Wed, Apr 12, 2023 at 10:59:54AM -0700, Rob Clark wrote: > >>>>>> On Wed, Apr 12, 2023 at 7:42 AM Tvrtko Ursulin > >>>>>> <tvrtko.ursulin@xxxxxxxxxxxxxxx> wrote: > >>>>>>> > >>>>>>> > >>>>>>> On 11/04/2023 23:56, Rob Clark wrote: > >>>>>>>> From: Rob Clark <robdclark@xxxxxxxxxxxx> > >>>>>>>> > >>>>>>>> Add support to dump GEM stats to fdinfo. > >>>>>>>> > >>>>>>>> v2: Fix typos, change size units to match docs, use div_u64 > >>>>>>>> v3: Do it in core > >>>>>>>> > >>>>>>>> Signed-off-by: Rob Clark <robdclark@xxxxxxxxxxxx> > >>>>>>>> Reviewed-by: Emil Velikov <emil.l.velikov@xxxxxxxxx> > >>>>>>>> --- > >>>>>>>> Documentation/gpu/drm-usage-stats.rst | 21 ++++++++ > >>>>>>>> drivers/gpu/drm/drm_file.c | 76 +++++++++++++++++++++++++++ > >>>>>>>> include/drm/drm_file.h | 1 + > >>>>>>>> include/drm/drm_gem.h | 19 +++++++ > >>>>>>>> 4 files changed, 117 insertions(+) > >>>>>>>> > >>>>>>>> diff --git a/Documentation/gpu/drm-usage-stats.rst b/Documentation/gpu/drm-usage-stats.rst > >>>>>>>> index b46327356e80..b5e7802532ed 100644 > >>>>>>>> --- a/Documentation/gpu/drm-usage-stats.rst > >>>>>>>> +++ b/Documentation/gpu/drm-usage-stats.rst > >>>>>>>> @@ -105,6 +105,27 @@ object belong to this client, in the respective memory region. > >>>>>>>> Default unit shall be bytes with optional unit specifiers of 'KiB' or 'MiB' > >>>>>>>> indicating kibi- or mebi-bytes. > >>>>>>>> > >>>>>>>> +- drm-shared-memory: <uint> [KiB|MiB] > >>>>>>>> + > >>>>>>>> +The total size of buffers that are shared with another file (ie. have more > >>>>>>>> +than a single handle). > >>>>>>>> + > >>>>>>>> +- drm-private-memory: <uint> [KiB|MiB] > >>>>>>>> + > >>>>>>>> +The total size of buffers that are not shared with another file. > >>>>>>>> + > >>>>>>>> +- drm-resident-memory: <uint> [KiB|MiB] > >>>>>>>> + > >>>>>>>> +The total size of buffers that are resident in system memory. > >>>>>>> > >>>>>>> I think this naming maybe does not work best with the existing > >>>>>>> drm-memory-<region> keys. > >>>>>> > >>>>>> Actually, it was very deliberate not to conflict with the existing > >>>>>> drm-memory-<region> keys ;-) > >>>>>> > >>>>>> I wouldn't have preferred drm-memory-{active,resident,...} but it > >>>>>> could be mis-parsed by existing userspace so my hands were a bit tied. > >>>>>> > >>>>>>> How about introduce the concept of a memory region from the start and > >>>>>>> use naming similar like we do for engines? > >>>>>>> > >>>>>>> drm-memory-$CATEGORY-$REGION: ... > >>>>>>> > >>>>>>> Then we document a bunch of categories and their semantics, for instance: > >>>>>>> > >>>>>>> 'size' - All reachable objects > >>>>>>> 'shared' - Subset of 'size' with handle_count > 1 > >>>>>>> 'resident' - Objects with backing store > >>>>>>> 'active' - Objects in use, subset of resident > >>>>>>> 'purgeable' - Or inactive? Subset of resident. > >>>>>>> > >>>>>>> We keep the same semantics as with process memory accounting (if I got > >>>>>>> it right) which could be desirable for a simplified mental model. > >>>>>>> > >>>>>>> (AMD needs to remind me of their 'drm-memory-...' keys semantics. If we > >>>>>>> correctly captured this in the first round it should be equivalent to > >>>>>>> 'resident' above. In any case we can document no category is equal to > >>>>>>> which category, and at most one of the two must be output.) > >>>>>>> > >>>>>>> Region names we at most partially standardize. Like we could say > >>>>>>> 'system' is to be used where backing store is system RAM and others are > >>>>>>> driver defined. > >>>>>>> > >>>>>>> Then discrete GPUs could emit N sets of key-values, one for each memory > >>>>>>> region they support. > >>>>>>> > >>>>>>> I think this all also works for objects which can be migrated between > >>>>>>> memory regions. 'Size' accounts them against all regions while for > >>>>>>> 'resident' they only appear in the region of their current placement, etc. > >>>>>> > >>>>>> I'm not too sure how to rectify different memory regions with this, > >>>>>> since drm core doesn't really know about the driver's memory regions. > >>>>>> Perhaps we can go back to this being a helper and drivers with vram > >>>>>> just don't use the helper? Or?? > >>>>> > >>>>> I think if you flip it around to drm-$CATEGORY-memory{-$REGION}: then it > >>>>> all works out reasonably consistently? > >>>> > >>>> That is basically what we have now. I could append -system to each to > >>>> make things easier to add vram/etc (from a uabi standpoint).. > >>> > >>> What you have isn't really -system, but everything. So doesn't really make > >>> sense to me to mark this -system, it's only really true for integrated (if > >>> they don't have stolen or something like that). > >>> > >>> Also my comment was more in reply to Tvrtko's suggestion. > >> > >> Right so my proposal was drm-memory-$CATEGORY-$REGION which I think aligns > >> with the current drm-memory-$REGION by extending, rather than creating > >> confusion with different order of key name components. > > > > Oh my comment was pretty much just bikeshed, in case someone creates a > > $REGION that other drivers use for $CATEGORY. Kinda Rob's parsing point. > > So $CATEGORY before the -memory. > > > > Otoh I don't think that'll happen, so I guess we can go with whatever more > > folks like :-) I don't really care much personally. > > Okay I missed the parsing problem. > > >> AMD currently has (among others) drm-memory-vram, which we could define in > >> the spec maps to category X, if category component is not present. > >> > >> Some examples: > >> > >> drm-memory-resident-system: > >> drm-memory-size-lmem0: > >> drm-memory-active-vram: > >> > >> Etc.. I think it creates a consistent story. > >> > >> Other than this, my two I think significant opens which haven't been > >> addressed yet are: > >> > >> 1) > >> > >> Why do we want totals (not per region) when userspace can trivially > >> aggregate if they want. What is the use case? > >> > >> 2) > >> > >> Current proposal limits the value to whole objects and fixates that by > >> having it in the common code. If/when some driver is able to support sub-BO > >> granularity they will need to opt out of the common printer at which point > >> it may be less churn to start with a helper rather than mid-layer. Or maybe > >> some drivers already support this, I don't know. Given how important VM BIND > >> is I wouldn't be surprised. > > > > I feel like for drivers using ttm we want a ttm helper which takes care of > > the region printing in hopefully a standard way. And that could then also > > take care of all kinds of of partial binding and funny rules (like maybe > > we want a standard vram region that addds up all the lmem regions on > > intel, so that all dgpu have a common vram bucket that generic tools > > understand?). > > First part yes, but for the second I would think we want to avoid any > aggregation in the kernel which can be done in userspace just as well. > Such total vram bucket would be pretty useless on Intel even since > userspace needs to be region aware to make use of all resources. It > could even be counter productive I think - "why am I getting out of > memory when half of my vram is unused!?". > > > It does mean we walk the bo list twice, but *shrug*. People have been > > complaining about procutils for decades, they're still horrible, I think > > walking bo lists twice internally in the ttm case is going to be ok. If > > not, it's internals, we can change them again. > > > > Also I'd lean a lot more towards making ttm a helper and not putting that > > into core, exactly because it's pretty clear we'll need more flexibility > > when it comes to accurate stats for multi-region drivers. > > Exactly. It could also be that the gem->status() fxn is extended to return _which_ pool that object is in.. but either way, we aren't painting ourselves into a corner > > But for a first "how much gpu space does this app use" across everything I > > think this is a good enough starting point. > > Okay so we agree this would be better as a helper and not in the core. > > On the point are keys/semantics good enough as a starting point I am > still not convinced kernel should aggregate and that instead we should > start from day one by appending -system (or something) to Rob's proposed > keys. I mean, if addition were expensive I might agree about not aggregating ;-) BR, -R > Regards, > > Tvrtko