Re: CPU overhead for drm fdinfo stats

Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxxxxxxxx> · Fri, 28 Jul 2023 09:36:28 +0100

On 27/07/2023 21:58, Alex Deucher wrote:
We have a number of customers using these stats, but the issue that
keeps coming up is the CPU overhead to gather them, particularly on
systems with hundreds of processes using the GPU.  Has anyone given
any thought to having a single interface to get this information for
the entire GPU in one place?

Could I have a framed told you so certificate please? :D

Well at least it depends on how much CPU overhead would your users be 
happy to eliminate and how much to keep. So maybe no need for that 
certificate just yet.

I was raising the issue of exponential complexity of walking "total 
number of processes" x "total number of file descriptors" on a system 
from the inception of fdinfo.

So for that issue the idea was to perhaps expose a list of pids with DRM 
fds open somewhere, maybe sysfs.

That would eliminate walking _all_ processes and trying to parse any 
their file descriptor.

But it would still require walking all file descriptors belonging to 
processes with DRM fds open.

If that wouldn't be enough of a saving for your users then no, I am not 
aware it was discussed. Assuming at least you were suggesting something 
like "read all fdinfo for all clients" in one blob. Also in sysfs? I 
think it would be doable by walking the dev->filelist and invoking 
drm_show_fdinfo() on them.

Out of curiosity are they using the fdinfo parsing code from IGT or 
something of their own?

Regards,

Tvrtko