On Wed, 10 Jan 2024 12:45:36 +0100 Christian Brauner <brauner@xxxxxxxxxx> wrote: > So say you do: > > mkdir /sys/kernel/tracing/instances/foo > > After this has returned we know everything we need to know about the new > tracefs instance including the ownership and the mode of all inodes in > /sys/kernel/tracing/instances/foo/events/* and below precisely because > ownership is always inherited from the parent dentry and recorded in the > metadata struct eventfs_inode. > > So say someone does: > > open("/sys/kernel/tracing/instances/foo/events/xfs"); > > and say this is the first time that someone accesses that events/ > directory. > > When the open pathwalk is done, the vfs will determine via > > [1] may_lookup(inode_of(events)) > > whether you are able to list entries such as "xfs" in that directory. > The vfs checks inode_permission(MAY_EXEC) on "events" and if that holds > it ends up calling i_op->eventfs_root_lookup(events). > > At this point tracefs/eventfs adds the inodes for all entries in that > "events" directory including "xfs" based on the metadata it recorded > during the mkdir. Since now someone is actually interested in them. And > it initializes the inodes with ownership and everything and adds the > dentries that belong into that directory. > > Nothing here depends on the permissions of the caller. The only > permission that mattered was done in the VFS in [1]. If the caller has > permissions to enter a directory they can lookup and list its contents. > And its contents where determined/fixed etc when mkdir was called. > > So we just need to add the required objects into the caches (inode, > dentry) whose addition we intentionally defered until someone actually > needed them. > > So, eventfs_root_lookup() now initializes the inodes with the ownership > from the stored metadata or from the parent dentry and splices in inodes > and dentries. No permission checking is needed for this because it is > always a recheck of what the vfs did in [1]. > > We now return to the vfs and path walk continues to the final component > that you actually want to open which is that "xfs" directory in this > example. We check the permissions on that inode via may_open("xfs") and > we open that directory returning an fd to userspace ultimately. > > (I'm going by memory since I need to step out the door.) So, let's say we do: chgrp -R rostedt /sys/kernel/tracing/ But I don't want rostedt to have access to xfs chgrp -R root /sys/kernel/tracing/events/xfs Both actions will create the inodes and dentries of all files and directories (because of "-R"). But once that is done, the ref counts go to zero. They stay around until reclaim. But then I open Chrome ;-) and it reclaims all the dentries and inodes, so we are back to here we were on boot. Now as rostedt I do: ls /sys/kernel/tracing/events/xfs The VFS layer doesn't know if I have permission to that or not, because all the inodes and dentries have been freed. It has to call back to eventfs to find out. Which the eventfs_root_lookup() and eventfs_iterate_shared() will recreated the inodes with the proper permission. Or are you saying that I don't need the ".permission" callback, because eventfs does it when it creates the inodes? But for eventfs to know what the permissions changes are, it uses .getattr and .setattr. -- Steve