On Sun, 28 Jan 2024 15:24:09 -0800 Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote: > On Sun, 28 Jan 2024 at 14:51, Steven Rostedt <rostedt@xxxxxxxxxxx> wrote: > > > > I was working on getting rid of ei->dentry, but then I hit: > > > > void eventfs_remove_dir(struct eventfs_inode *ei) > > { [..] > > > > Where it deletes the all the existing dentries in a tree. Is this a > > valid place to keep ei->dentry? > > No, when you remove the directory, just leave the child dentries > alone. Remember: they are purely caches, and you either have My fear is that they can be accessed again when there's no file around. Thus, the dentry and the inode are created when accessed with the fops loaded that calls into the tracing code. For example, when some creates a kprobe: # cd /sys/kernel/tracing # echo 'p:sched schedule' > kprobe_events # ls events/kprobes/sched/enable enable Now I do; # echo > kprobe_events Currently if I do: # ls events/kprobes/sched/enable ls: cannot access 'events/kprobes/sched/enable' But because the dentry doesn't go away immediately, it can cause issues. I rather not force them to be freed every time the dentry goes to zero, as that could cause more overhead in tracing where the tooling already does multiple scans of the eventfs directory for various reasons. If that dentry is still there, and someone does: echo 1 > events/kprobes/sched/enable after the ei was removed, wouldn't it call back into the open callback of the inode represented by that dentry? If that happens, it will call into the kprobe code and reference structures that have already been freed. Note, before adding that simple_recursive_removal() or my open coded version I had earlier, the kprobe files would stick around after the kprobe was freed, and I was able to crash the kernel. > > - somebody is still using it (you can 'rmdir()' a directory that some > other process has as its cwd, for example), which keeps it alive and > active anyway Wouldn't it be bad if the dentry hung around after the rmdir. You don't want to be able to access files after rmdir has finished. > > - when the last user is done, the dcache code will just free the > dentries anyway But it doesn't. That debug code I had before that showed the ei->dentry in show_event_dentries file would show that the dentries exist for some time when their ref count was zero. They only got freed when on drop_caches or memory reclaim. I like the fact that they hang around that we are not constantly allocating them for every reference. > > so there's no reason to remove any of the dentries by hand - and in > fact simple_recursive_removal() never did that anyway for anything > that was still busy. > > For a pure cached set of dentries (that have no users), doing the last > "dput()" on a directory will free that directory dentry, but it will > also automatically free all the unreachable children recursively. But it doesn't free it. It just makes it available to be freed. > > Sure, simple_recursive_removal() does other things (sets inode flags > to S_DEAD, does fsnotify etc), but none of those should actually > matter. > > I think that whole logic is simply left-over from when the dentries > weren't a filesystem cache, but were the *actual* filesystem. So it > actually became actively wrong when you started doing your own backing > store, but it just didn't hurt (except for code legibility). > > Of course, eventfs is slightly odd and special in that this isn't a > normal "rmdir()", so it can happen with files still populated. And > those children will stick around and be useless baggage until they > are shrunk under memory pressure. > > But I don't think it should *semantically* matter, exactly because > they always could stay around anyway due to having users. It does, because things like kprobes will call into tracefs to free its files so that it can free its own resources as the open callback will reference those resources. If those dentries sick around and allow the user to open the file again, it will cause a use after free bug. It does keep track of opens so the kprobe code will not be freed if a task has a reference already. You get an -EBUSY on the removal of kprobes if something has a reference to it. But on deleting, the return from the eventfs code that deleted the directory, is expected that there will be no more opens on the kprobe files. With stale dentries hanging around after the directory is deleted, that is not the case. > > There are some cacheability knobs like > > .d_delete = always_delete_dentry, > > which probably makes sense for any virtual filesystem (it limits > caching of dentries with no more users - normally the dcache will > happily keep caching dentries for the *next* user, but that may or may > not make sense for virtual filesystems) But if it were freed every time there's no more users, then when you do a stat() it will get referenced then freed (as the dentry ref count goes to zero after the stat() is completed). then the open/write/close will create and delete it, and this could be done several times for the same file. If the dentry is freed every time its ref count goes to zero, then it will need to be created for every operation. > > So there is tuning that can be done, but in general, I think you > should actively think of the dcache as just "it's just a cache, leave > stale stuff alone" I would like to keep it as a cache, but deleting every time the ref count goes to zero is not much of a cache. -- Steve