On Tue, Nov 12, 2024 at 1:37 AM Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote: > > On Mon, 11 Nov 2024 at 16:00, Amir Goldstein <amir73il@xxxxxxxxx> wrote: > > > > I think that's a good idea for pre-content events, because it's fine > > to say that if the sb/mount was not watched by a pre-content event listener > > at the time of file open, then we do not care. > > Right. > > > The problem is that legacy inotify/fanotify watches can be added after > > file is open, so that is allegedly why this optimization was not done for > > fsnotify hooks in the past. > > So honestly, even if the legacy fsnotify hooks can't look at the file > flag, they could damn well look at an inode flag. > Legacy fanotify has a mount watch (FAN_MARK_MOUNT), which is the common way for Anti-malware to set watches on filesystems, so I am not sure what you are saying. > And I'm not even convinced that we couldn't fix them to just look at a > file flag, and say "tough luck, somebody opened that file before you > started watching, you don't get to see what they did". That would specifically break tail -f (for inotify) and probably many other tools, but as long as we also look at the inode flags (i_fsnotify_mask) and the dentry flags (DCACHE_FSNOTIFY_PARENT_WATCHED), then I think we may be able to get away with changing the semantics for open files on a fanotify mount watch. Specifically, I would really like to eliminate completely the cost of FAN_ACCESS_PERM event, which could be gated on file flag, because this is only for security/Anti-malware and I don't think this event is practically useful and it sure does not need to guarantee permission events to mount watchers on already open files. > > So even if we don't look at a file->f_mode flag, the lergacy cases > should look at i_fsnotify_mask, and do that *first*. > > IOW, not do it like fsnotify_object_watched() does now, which is just > broken. Again, it looks at inode->i_sb->s_fsnotify_mask completely > pointlessly, but it also does it much too late - it gets called after > we've already called into the fsnotify() code and have messed up the > I$ etc. > > The "linode->i_sb->s_fsnotify_mask" is not only an extra indirection, > it should be very *literally* pointless. If some bit isn't set in > i_sb->s_fsnotify_mask, then there should be no way to set that bit in > inode->i_fsnotify_mask. So the only time we should access > i_sb->s_fsnotify_mask is when i_notify_mask gets *modified*, not when > it gets tested. > i_fsnotify_mask is the cumulative mask of all inode watchers s_fsnotify_mask is *not* the cumulative of all i_fsnotify_mask s_fsnotify_mask is the cumulative mask of all sb watchers mnt_fsnotify_marks is the cumulative mask of all mount watchers > But even if that silly and pointless i_sb->s_fsnotify_mask thing is > removed, fsnotify_object_watched() is *still* wrong, because it > requires that mnt_mask as an argument, which means that the caller now > has to look it up - all this entirely pointless work that should never > be done if the bit wasn't set in inode->i_fsnotify_mask. > > So I really think fsnotify is doing *everything* wrong. Note the difference between fsnotify_sb_has_watchers() and fsnotify_object_watched(). The former is an early optimization gate that checks if there are any inode/mount/sb watchers (per class) on the filesystem, regardless of specific events and specific target inode/file. We could possibly further optimize fsnotify_sb_has_watchers() to avoid access to ->s_fsnotify_info by setting sb flag (for each priority class). The latter checks if any inode/mount/sb are interested in a specific event on the said object. In upstream code, fsnotify_object_watched() is always gated behind fsnotify_sb_has_watchers(), which tries to prevent the indirect call. The new fsnotify_file_has_pre_content_watches() helper could have checked fsnotify_sb_has_priority_watchers(sb, FSNOTIFY_PRIO_CONTENT); but it is better to gate by file flag as you suggested. > > And I most certainly don't want to add more runtime hooks to > *critical* code like open/read/write. > > Right now, many of the fsnotify things are for "metadata", ie for > bigger file creation / removal / move etc. And yes, the "don't do this > if there are no fsnotify watchers AT ALL" does actually end up meaning > that most of the time I never see any of it in profiles, because the > fsnotify_sb_has_watchers() culls out that case. > > And while the fsnotify_sb_has_watchers() thing is broken garbage and > does too many indirections and is not testing the right thing, at > least it's inlined and you don't get the function calls. > > That doesn't make fsnotify "right", but at least it's not in my face. > I see the sb accesses, and I hate them, but it's usually at least > hidden. Admittedly not as well hidden as it *should* be, since it does > the access tests in the wrong order, but the old fsnotify_open() > doesn't strike me as "terminally broken". > > It doesn't have a permission test after the open has already done > things, and it's inlined enough that it isn't actively offensive. > > And most of the other fsnotify things have the same pattern - not > great, but not actively offensive. > > These new patches make it in my face. > > So I do require that the *new* cases at least get it right. The fact > that we have old code that is misdesigned and gets it wrong and should > also be improved isn't an excuse to add *more* badly coded stuff. > > And yes, if somebody fixes the old fsnotify stuff to check just the > i_fsnotify_mask in the inline function, and moves all the other silly > checks out-of-line, that would be an improvement. I'd very much > applaud that. But it's a separate thing from adding new hooks. We will do both. Thanks, Amir.