Back in 2022 we already had a session at LSFMM where we talked about eventfs and we said that it should be based on kernfs and any missing functionality be implemented in kernfs. Instead we've gotten a hand-rolled version of similar functionality and 100+ mails exchanges over the last weeks to fix bugs in there binding people's time. All we've heard so far were either claims that it would be too difficult to port tracefs to kernfs or that it somehow wouldn't work but we've never heard why and it's never been demonstrated why. So I went and started a draft for porting all of tracefs to kernfs in the hopes that someone picks this up and finishes the work. I've gotten the core of it done and it's pretty easy to do logical copy-pasta to port this to eventfs as well. I want to see tracefs and eventfs ported to kernfs and get rid of the hand-rolled implementation. I don't see the value in any additional talks about why eventfs is special until we've seen an implementation of tracefs on kernfs. I'm pretty certain that we have capable people that can and want to finish the port (I frankly don't have time for this unless I drop all reviews.). I've started just jotting down the basics yesterday evening and came to the conclusion that: * It'll get rid of pointless dentry pinning in various places that is currently done in the first place. Instead only a kernfs root and a kernfs node need to be stashed. Dentries and inodes are added on-demand. * It'll make _all of_ tracefs capable of on-demand dentry and inode creation. * Quoting [1]: > The biggest savings in eventfs is the fact that it has no meta data for > files. All the directories in eventfs has a fixed number of files when they > are created. The creating of a directory passes in an array that has a list > of names and callbacks to call when the file needs to be accessed. Note, > this array is static for all events. That is, there's one array for all > event files, and one array for all event systems, they are not allocated per > directory. This is all possible with kernfs. * All ownership information (mode, uid, gid) is stashed and kept kernfs_node->iattrs. So the parent kernfs_node's ownership can be used to set the child's ownership information. This will allow to get rid of any custom permission checking and ->getattr() and ->setattr() calls. * Private tracefs data that was stashed in inode->i_private is stashed in kernfs_node->priv. That's always accessible in kernfs->open() calls via kernfs_open_file->kn->priv but it could also be transferred to kernfs_open_file->priv. In any case, it makes it a lot easier to handle private data than tracefs does it now. * It'll make maintenance of tracefs easier in the long run because new functionality and improvements get added to kernfs including better integration with namespaces (I've had patchsets for kernfs a while ago to unlock additional namespaces.) * There's no need for separate i_ops for "instances" and regular tracefs directories. Simply compare the stashed kernfs_node of the "instances" directory against the current kernfs_node passed to ->mkdir() or ->rmdir() whether the directory creation or deletion is allowed. * Frankly, another big reason to do it is simply maintenance. All of the maintenance burden neeeds to be shifted to the generic kernfs implementation which is maintained by people familar with filesystem details. I'm willing to support it too. No shade, but currently I don't see how eventfs can be maintained without the involvement of others. Maintainability alone should be a sufficient reason to move all of this to kernfs and add any missing functionality. * If we have a session about this at LSFMM and I want to see a POC of tracefs and eventfs built on top of kernfs. I'm tired of talking about a private implementation of functionality that already exists. Otherwise, this is just wasting everyone's time and eventfs as it is will not become common infrastructure. * Yes, debugfs could or should be ported as well but it's almost irrelevant for debugfs. It's a debugging filesystem. If you enable it on a production workload then you have bigger problems to worry about than wasted memory. So I don't consider that urgent. But tracefs is causing us headaches right now and I'm weary of cementing a hand-rolled implementation. So really, please let's move this to kernfs, fix any things that aren't supported in kernfs (I haven't seen any) and get rid of all the custom functionality. Part of the work is moving tracefs to the new mount api (which should've been done anyway). The fs/tracefs/ part already compiles. The rest I haven't finished converting. All the file_operations need to be moved to kernfs_ops which shouldn't be too difficult. To: Steven Rostedt <rostedt@xxxxxxxxxxx> To: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> To: Amir Goldstein <amir73il@xxxxxxxxx> To: Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx> Cc: lsf-pc@xxxxxxxxxxxxxxxxxxxxxxxxxx, Cc: linux-fsdevel@xxxxxxxxxxxxxxx Cc: Al Viro <viro@xxxxxxxxxxxxxxxxxx> Cc: Matthew Wilcox <willy@xxxxxxxxxxxxx> Link: https://lore.kernel.org/r/20240129105726.2c2f77f0@xxxxxxxxxxxxxxxxxx [1] Link: https://lore.kernel.org/r/20240129105726.2c2f77f0@xxxxxxxxxxxxxxxxxx --- Christian Brauner (4): [DRAFT]: tracefs: port to kernfs [DRAFT]: trace: stash kernfs_node instead of dentries [DRAFT]: hwlat: port struct file_operations thread_mode_fops to struct kernfs_ops [DRAFT]: trace: illustrate how to convert basic open functions fs/kernfs/mount.c | 10 + fs/tracefs/inode.c | 649 +++++++++++++------------------------- include/linux/kernfs.h | 3 + include/linux/tracefs.h | 18 +- kernel/trace/trace.c | 22 +- kernel/trace/trace.h | 4 +- kernel/trace/trace_events_synth.c | 4 +- kernel/trace/trace_events_user.c | 2 +- kernel/trace/trace_hwlat.c | 45 +-- 9 files changed, 270 insertions(+), 487 deletions(-) --- base-commit: 41bccc98fb7931d63d03f326a746ac4d429c1dd3 change-id: 20240131-tracefs-kernfs-3f2def6eab11