On Mon 11-11-24 21:11:01, Amir Goldstein wrote: > We got a report that adding a fanotify filsystem watch prevents tail -f > from receiving events. > > Reproducer: > > 1. Create 3 windows / login sessions. Become root in each session. > 2. Choose a mounted filesystem that is pretty quiet; I picked /boot. > 3. In the first window, run: fsnotifywait -S -m /boot > 4. In the second window, run: echo data >> /boot/foo > 5. In the third window, run: tail -f /boot/foo > 6. Go back to the second window and run: echo more data >> /boot/foo > 7. Observe that the tail command doesn't show the new data. > 8. In the first window, hit control-C to interrupt fsnotifywait. > 9. In the second window, run: echo still more data >> /boot/foo > 10. Observe that the tail command in the third window has now printed > the missing data. > > When stracing tail, we observed that when fanotify filesystem mark is > set, tail does get the inotify event, but the event is receieved with > the filename: > > read(4, "\1\0\0\0\2\0\0\0\0\0\0\0\20\0\0\0foo\0\0\0\0\0\0\0\0\0\0\0\0\0", > 50) = 32 > > This is unexpected, because tail is watching the file itself and not its > parent and is inconsistent with the inotify event received by tail when > fanotify filesystem mark is not set: > > read(4, "\1\0\0\0\2\0\0\0\0\0\0\0\0\0\0\0", 50) = 16 > > The inteference between different fsnotify groups was caused by the fact > that the mark on the sb requires the filename, so the filename is passed > to fsnotify(). Later on, fsnotify_handle_event() tries to take care of > not passing the filename to groups (such as inotify) that are interested > in the filename only when the parent is watching. > > But the logic was incorrect for the case that no group is watching the > parent, some groups are watching the sb and some watching the inode. > > Reported-by: Miklos Szeredi <miklos@xxxxxxxxxx> > Fixes: 7372e79c9eb9 ("fanotify: fix logic of reporting name info with watched parent") > Cc: stable@xxxxxxxxxxxxxxx # 5.10+ > Signed-off-by: Amir Goldstein <amir73il@xxxxxxxxx> Thanks for analysis, Amir! > @@ -333,12 +333,14 @@ static int fsnotify_handle_event(struct fsnotify_group *group, __u32 mask, > if (!inode_mark) > return 0; > > - if (mask & FS_EVENT_ON_CHILD) { > + if (mask & FS_EVENTS_POSS_ON_CHILD) { So this is going to work but as far as I'm reading the code in fsnotify_handle_event() I would be maybe calmer if we instead wrote the condition as: if (!(mask & ALL_FSNOTIFY_DIRENT_EVENTS)) I.e., if the event on the inode is not expecting name & dir, clear them. Instead of your variant which I understand as: "if we could have added name & dir only for parent, clear it now". The bitwise difference between these two checks is: FS_DELETE_SELF | FS_MOVE_SELF | FS_UNMOUNT | FS_Q_OVERFLOW | FS_IN_IGNORED | FS_ERROR, none of which should matter. Maybe I'm paranoid but we already had too many subtle bugs in this code so I'm striving for maximum robustness :). What do you think? Honza > /* > * Some events can be sent on both parent dir and child marks > * (e.g. FS_ATTRIB). If both parent dir and child are > * watching, report the event once to parent dir with name (if > * interested) and once to child without name (if interested). > + * > + * In any case, whether the parent is watching or not watching, > * The child watcher is expecting an event without a file name > * and without the FS_EVENT_ON_CHILD flag. > */ > -- > 2.34.1 > -- Jan Kara <jack@xxxxxxxx> SUSE Labs, CR