On Tue, Nov 12, 2024 at 12:22 AM Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote: > > On Mon, 11 Nov 2024 at 14:46, Josef Bacik <josef@xxxxxxxxxxxxxx> wrote: > > > > Did you see the patch that added the > > fsnotify_file_has_pre_content_watches() thing? > > No, because I had gotten to patch 6/11, and it added this open thing, > and there was no such thing in any of the patches before it. > > It looks like you added FSNOTIFY_PRE_CONTENT_EVENTS in 11/17. > > However, at no point does it look like you actually test it at open > time, so none of this seems to matter. > > As far as I can see, even at the end of the series, you will call the > fsnotify hook at open time even if there are no content watches on the > file. > > So apparently the fsnotify_file_has_pre_content_watches() is not > called when it should be, and when it *is* called, it's also doing > completely the wrong thing. > > Look, for basic operations THAT DON'T CARE, you now added a function > call to fsnotify_file_has_pre_content_watches(), that function call > looks at inode->i_sb->s_iflags (doing two D$ accesses that shouldn't > be done!), and then after that looks at the i_fsnotify_mask. > > THIS IS EXACTLY THE KIND OF GARBAGE I'M TALKING ABOUT. > > This code has been written by somebody who NEVER EVER looked at > profiles. You're following chains of pointers when you never should. > > Look, here's a very basic example of the kind of complete mis-design > I'm talking about: > > - we're doing a basic read() on a file that isn't being watched. > > - we want to maybe do read-ahead > > - the code does > > if (fsnotify_file_has_pre_content_watches(file)) > return fpin; > > to say that "don't do read-ahead". > > Fine, I understand the concept. But keep in mind that the common case > is presumably that there are no content watches. > > And even ignoring the "common case" issue, that's the one you want to > OPTIMIZE for. That's the case that matters for performance, because > clearly if there are content watches, you're going to go into "Go > Slow" mode anyway and not do pre-fetching. So even if content watches > are common on some load, they are clearly not the case you should do > performance optimization for. > > With me so far? > > So if THAT is the case that matters, then dammit, we shouldn't be > calling a function at all. > > And when calling the function, we shouldn't start out with this > completely broken logic: > > struct inode *inode = file_inode(file); > __u32 mnt_mask = real_mount(file->f_path.mnt)->mnt_fsnotify_mask; > > if (!(inode->i_sb->s_iflags & SB_I_ALLOW_HSM)) > return false; > > that does random crap and looks up some "mount mask" and looks up the > superblock flags. > > Why shouldn't we do this? > > BECAUSE NONE OF THIS MATTERS IF THE FILE HASN'T EVEN BEEN MARKED FOR > CONTENT MATCHES! > > See why I'm shouting? You're doing insane things, and you're doing > them for all the cases that DO NOT MATTER. You're doing all of this > for the common case that doesn't want to see that kind of mindless > overhead. > > You literally check for the "do I even care" *last*, when you finally > do that fsnotify_object_watched() check that looks at the inode. But > by then you have already wasted all that time and effort, and > fsnotify_object_watched() is broken anyway, because it's stupidly > designed to require that mnt_mask that isn't needed if you have > properly marked each object individually. > > So what *should* you have? > > You should have had a per-file flag saying "Do I need to even call > this crud at all", and have it in a location where you don't need to > look at anything else. > > And fsnotify already basically has that flag, except it's mis-designed > too. We have FMODE_NONOTIFY, which is the wrong way around (saying > "don't notify", when that should just be the *default*), and the > fsnotify layer uses it only to mark its own internal files so that it > doesn't get called recursively. So that flag that *looks* sane and is > in the right location is actually doing the wrong thing, because it's > dealing with a rare special case, not the important cases that > actually matter. > > So all of this readahead logic - and all of the read and write hooks - > should be behind a simple "oh, this file doesn't have any notification > stuff, so don't bother calling any fsnotify functions". > > So I think the pattern should be > > static inline bool fsnotify_file_has_pre_content_watches(struct file *file) > { > if (unlikely(file->f_mode & FMODE_NOTIFY)) > return out_of_line_crud(file); > return false; > } > I think that's a good idea for pre-content events, because it's fine to say that if the sb/mount was not watched by a pre-content event listener at the time of file open, then we do not care. The problem is that legacy inotify/fanotify watches can be added after file is open, so that is allegedly why this optimization was not done for fsnotify hooks in the past. Thanks, Amir.