On Fri, Jul 28, 2023 at 4:36 AM Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxxxxxxxx> wrote: > > > On 27/07/2023 21:58, Alex Deucher wrote: > > We have a number of customers using these stats, but the issue that > > keeps coming up is the CPU overhead to gather them, particularly on > > systems with hundreds of processes using the GPU. Has anyone given > > any thought to having a single interface to get this information for > > the entire GPU in one place? > > Could I have a framed told you so certificate please? :D > > Well at least it depends on how much CPU overhead would your users be > happy to eliminate and how much to keep. So maybe no need for that > certificate just yet. > > I was raising the issue of exponential complexity of walking "total > number of processes" x "total number of file descriptors" on a system > from the inception of fdinfo. > > So for that issue the idea was to perhaps expose a list of pids with DRM > fds open somewhere, maybe sysfs. > > That would eliminate walking _all_ processes and trying to parse any > their file descriptor. > > But it would still require walking all file descriptors belonging to > processes with DRM fds open. > > If that wouldn't be enough of a saving for your users then no, I am not > aware it was discussed. Assuming at least you were suggesting something > like "read all fdinfo for all clients" in one blob. Also in sysfs? I > think it would be doable by walking the dev->filelist and invoking > drm_show_fdinfo() on them. Yes, something like that. I think generally for telemetry reasons they want to see stats of the engine times and memory usage of all of the clients on the GPU. In our case there is also a lot of CPU overhead in parsing the memory usage due to the locking of the buffer objects to get their location. I'm not sure of a good way to reduce that. Maybe caching it to reduce the update granularity. > > Out of curiosity are they using the fdinfo parsing code from IGT or > something of their own? I think it's a mix. Alex