On Wed, Nov 13, 2024 at 2:43 PM Jan Kara <jack@xxxxxxx> wrote: > > On Mon 11-11-24 21:11:01, Amir Goldstein wrote: > > We got a report that adding a fanotify filsystem watch prevents tail -f > > from receiving events. > > > > Reproducer: > > > > 1. Create 3 windows / login sessions. Become root in each session. > > 2. Choose a mounted filesystem that is pretty quiet; I picked /boot. > > 3. In the first window, run: fsnotifywait -S -m /boot > > 4. In the second window, run: echo data >> /boot/foo > > 5. In the third window, run: tail -f /boot/foo > > 6. Go back to the second window and run: echo more data >> /boot/foo > > 7. Observe that the tail command doesn't show the new data. > > 8. In the first window, hit control-C to interrupt fsnotifywait. > > 9. In the second window, run: echo still more data >> /boot/foo > > 10. Observe that the tail command in the third window has now printed > > the missing data. > > > > When stracing tail, we observed that when fanotify filesystem mark is > > set, tail does get the inotify event, but the event is receieved with > > the filename: > > > > read(4, "\1\0\0\0\2\0\0\0\0\0\0\0\20\0\0\0foo\0\0\0\0\0\0\0\0\0\0\0\0\0", > > 50) = 32 > > > > This is unexpected, because tail is watching the file itself and not its > > parent and is inconsistent with the inotify event received by tail when > > fanotify filesystem mark is not set: > > > > read(4, "\1\0\0\0\2\0\0\0\0\0\0\0\0\0\0\0", 50) = 16 > > > > The inteference between different fsnotify groups was caused by the fact > > that the mark on the sb requires the filename, so the filename is passed > > to fsnotify(). Later on, fsnotify_handle_event() tries to take care of > > not passing the filename to groups (such as inotify) that are interested > > in the filename only when the parent is watching. > > > > But the logic was incorrect for the case that no group is watching the > > parent, some groups are watching the sb and some watching the inode. > > > > Reported-by: Miklos Szeredi <miklos@xxxxxxxxxx> > > Fixes: 7372e79c9eb9 ("fanotify: fix logic of reporting name info with watched parent") > > Cc: stable@xxxxxxxxxxxxxxx # 5.10+ > > Signed-off-by: Amir Goldstein <amir73il@xxxxxxxxx> > > Thanks for analysis, Amir! > > > @@ -333,12 +333,14 @@ static int fsnotify_handle_event(struct fsnotify_group *group, __u32 mask, > > if (!inode_mark) > > return 0; > > > > - if (mask & FS_EVENT_ON_CHILD) { > > + if (mask & FS_EVENTS_POSS_ON_CHILD) { > > So this is going to work but as far as I'm reading the code in > fsnotify_handle_event() I would be maybe calmer if we instead wrote the > condition as: > > if (!(mask & ALL_FSNOTIFY_DIRENT_EVENTS)) The problem is that the comment below "Some events can be sent on both parent dir and child marks..." is relevant in the context of FS_EVENTS_POSS_ON_CHILD and FS_EVENT_ON_CHILD, meaning those are exactly the set of events that could be sent to parent with FS_EVENT_ON_CHILD and to child without it. The comment makes no sense in the context of the ALL_FSNOTIFY_DIRENT_EVENTS check, Unless we add a comment saying the dirent events set has zero intersection with events possible on child. > > I.e., if the event on the inode is not expecting name & dir, clear them. > Instead of your variant which I understand as: "if we could have added name > & dir only for parent, clear it now". The bitwise difference between these > two checks is: FS_DELETE_SELF | FS_MOVE_SELF | FS_UNMOUNT | FS_Q_OVERFLOW | > FS_IN_IGNORED | FS_ERROR, none of which should matter. Maybe I'm paranoid > but we already had too many subtle bugs in this code so I'm striving for > maximum robustness :). What do you think? How about a BUILD_BUG_ON(FS_EVENTS_POSS_ON_CHILD & ALL_FSNOTIFY_DIRENT_EVENTS) with a comment to clarify? > BTW, I can just massage the patch on commit since you're now busy with HSM > stuff but I wanted to check what's your opinion on the change. Sure, no problem. Thanks, Amir.