On 20/10/2022 12:33, Christian König wrote:
Am 20.10.22 um 09:34 schrieb Tvrtko Ursulin:
On 20/10/2022 07:40, Christian König wrote:
Am 19.10.22 um 19:32 schrieb Tvrtko Ursulin:
From: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx>
To enable propagation of settings from the cgroup drm controller to
drm we
need to start tracking which processes own which drm clients.
Implement that by tracking the struct pid pointer of the owning
process in
a new XArray, pointing to a structure containing a list of associated
struct drm_file pointers.
Clients are added and removed under the filelist mutex and RCU list
operations are used below it to allow for lockless lookup.
That won't work easily like this. The problem is that file_priv->pid
is usually not accurate these days:
From the debugfs clients file:
systemd-logind 773 0 y y 0 0
Xorg 1639 128 n n 1000 0
Xorg 1639 128 n n 1000 0
Xorg 1639 128 n n 1000 0
firefox 2945 128 n n 1000 0
Xorg 1639 128 n n 1000 0
Xorg 1639 128 n n 1000 0
Xorg 1639 128 n n 1000 0
Xorg 1639 128 n n 1000 0
chrome 35940 128 n n 1000 0
chrome 35940 0 n y 1000 1
chrome 35940 0 n y 1000 2
Xorg 1639 128 n n 1000 0
Xorg 1639 128 n n 1000 0
Xorg 1639 128 n n 1000 0
This is with glxgears and a bunch other OpenGL applications running.
The problem is that for most applications the X/Wayland server is now
opening the render node. The only exceptions in this case are apps
using DRI2 (VA-API?).
I always wanted to fix this and actually track who is using the file
descriptor instead of who opened it, but never had the time to do this.
There's a patch later in the series which allows client records to be
migrated to a new PID, and then i915 patch to do that when fd is used
for context create. That approach I think worked well enough in the
past. So maybe it could be done in the DRM core at some suitable entry
point.
Yeah, that makes some sense. I think you should wire that inside
drm_ioctl(), as far as I know more or less all uses of a file descriptor
would go through that function.
And maybe make that a stand alone patch, cause that can go upstream as a
bug fix independently if you ask me.
I've put it on my todo list to try and come up with something standalone
for this problem. Will see if I manage to send it separately or perhaps
will start the next cgroup controller RFC with those patches.
I think you need to fix this problem first. And BTW: and unsigned
long doesn't work as PID either with containers.
This I am not familiar with so would like to hear more if you could
point me in the right direction at least.
Uff, I'm the wrong person to ask stuff like that. I just can say from
experience because I've ran into that trap as well.
My assumption was that struct pid *, which is what I store in unsigned
long, would be unique in a system where there is a single kernel
running, so as long as lifetimes are correct (released from tracking
here when fd is closed, which is implicit on process exit) would work.
You are suggesting that is not so?
I think you should have the pointer to struct pid directly here since
that is a reference counted structure IIRC. But don't ask me what the
semantics is how to get or put a reference.
Yeah I think I have all that. I track struct pid, with a reference, in
drm client, and release it when file descriptor is closed (indirectly
via the DRM close hook). All I need, I think, is for that mapping to
answer me "which drm_file objects" are in use by this struct pid
pointer. I don't see a problem with lifetimes or scope yet.
Regards,
Tvrtko