On Thu, Nov 14, 2024 at 4:01 PM Jan Kara <jack@xxxxxxx> wrote: > > On Wed 13-11-24 19:49:31, Amir Goldstein wrote: > > From 7a2cd74654a53684d545b96c57c9091e420b3add Mon Sep 17 00:00:00 2001 > > From: Amir Goldstein <amir73il@xxxxxxxxx> > > Date: Tue, 12 Nov 2024 13:46:08 +0100 > > Subject: [PATCH] fsnotify: opt-in for permission events at file open time > > > > Legacy inotify/fanotify listeners can add watches for events on inode, > > parent or mount and expect to get events (e.g. FS_MODIFY) on files that > > were already open at the time of setting up the watches. > > > > fanotify permission events are typically used by Anti-malware sofware, > > that is watching the entire mount and it is not common to have more that > > one Anti-malware engine installed on a system. > > > > To reduce the overhead of the fsnotify_file_perm() hooks on every file > > access, relax the semantics of the legacy FAN_ACCESS_PERM event to generate > > events only if there were *any* permission event listeners on the > > filesystem at the time that the file was opened. > > > > The new semantic is implemented by extending the FMODE_NONOTIFY bit into > > two FMODE_NONOTIFY_* bits, that are used to store a mode for which of the > > events types to report. > > > > This is going to apply to the new fanotify pre-content events in order > > to reduce the cost of the new pre-content event vfs hooks. > > > > Suggested-by: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> > > Link: https://lore.kernel.org/linux-fsdevel/CAHk-=wj8L=mtcRTi=NECHMGfZQgXOp_uix1YVh04fEmrKaMnXA@xxxxxxxxxxxxxx/ > > Signed-off-by: Amir Goldstein <amir73il@xxxxxxxxx> > > Couple of notes below. > > > diff --git a/fs/open.c b/fs/open.c > > index 226aca8c7909..194c2c8d8cd4 100644 > > --- a/fs/open.c > > +++ b/fs/open.c > > @@ -901,7 +901,7 @@ static int do_dentry_open(struct file *f, > > f->f_sb_err = file_sample_sb_err(f); > > > > if (unlikely(f->f_flags & O_PATH)) { > > - f->f_mode = FMODE_PATH | FMODE_OPENED; > > + f->f_mode = FMODE_PATH | FMODE_OPENED | FMODE_NONOTIFY; > > f->f_op = &empty_fops; > > return 0; > > } > > @@ -929,6 +929,12 @@ static int do_dentry_open(struct file *f, > > if (error) > > goto cleanup_all; > > > > + /* > > + * Set FMODE_NONOTIFY_* bits according to existing permission watches. > > + * If FMODE_NONOTIFY was already set for an fanotify fd, this doesn't > > + * change anything. > > + */ > > + f->f_mode |= fsnotify_file_mode(f); > > Maybe it would be obvious to do this like: > > file_set_fsnotify_mode(f); > > Because currently this depends on the details of how exactly FMODE_NONOTIFY > is encoded. > ok. makes sense. > > diff --git a/include/linux/fs.h b/include/linux/fs.h > > index 70359dd669ff..dd583ce7dba8 100644 > > --- a/include/linux/fs.h > > +++ b/include/linux/fs.h > > @@ -173,13 +173,14 @@ typedef int (dio_iodone_t)(struct kiocb *iocb, loff_t offset, > > > > #define FMODE_NOREUSE ((__force fmode_t)(1 << 23)) > > > > -/* FMODE_* bit 24 */ > > - > > /* File is embedded in backing_file object */ > > -#define FMODE_BACKING ((__force fmode_t)(1 << 25)) > > +#define FMODE_BACKING ((__force fmode_t)(1 << 24)) > > + > > +/* File shouldn't generate fanotify pre-content events */ > > +#define FMODE_NONOTIFY_HSM ((__force fmode_t)(1 << 25)) > > > > -/* File was opened by fanotify and shouldn't generate fanotify events */ > > -#define FMODE_NONOTIFY ((__force fmode_t)(1 << 26)) > > +/* File shouldn't generate fanotify permission events */ > > +#define FMODE_NONOTIFY_PERM ((__force fmode_t)(1 << 26)) > > > > /* File is capable of returning -EAGAIN if I/O will block */ > > #define FMODE_NOWAIT ((__force fmode_t)(1 << 27)) > > @@ -190,6 +191,21 @@ typedef int (dio_iodone_t)(struct kiocb *iocb, loff_t offset, > > /* File does not contribute to nr_files count */ > > #define FMODE_NOACCOUNT ((__force fmode_t)(1 << 29)) > > > > +/* > > + * The two FMODE_NONOTIFY_ bits used together have a special meaning of > > + * not reporting any events at all including non-permission events. > > + * These are the possible values of FMODE_NOTIFY(f->f_mode) and their meaning: > > + * > > + * FMODE_NONOTIFY_HSM - suppress only pre-content events. > > + * FMODE_NONOTIFY_PERM - suppress permission (incl. pre-content) events. > > + * FMODE_NONOTIFY - suppress all (incl. non-permission) events. > > + */ > > +#define FMODE_NONOTIFY_MASK \ > > + (FMODE_NONOTIFY_HSM | FMODE_NONOTIFY_PERM) > > +#define FMODE_NONOTIFY FMODE_NONOTIFY_MASK > > +#define FMODE_NOTIFY(mode) \ > > + ((mode) & FMODE_NONOTIFY_MASK) > > This looks a bit error-prone to me (FMODE_NONOTIFY looks like another FMODE > flag but in fact it is not which is an invitation for subtle bugs) and the > tests below which are sometimes done as (FMODE_NOTIFY(mode) == xxx) and > sometimes as (file->f_mode & xxx) are inconsistent and confusing (unless you > understand what's happening under the hood). > > So how about defining macros like FMODE_FSNOTIFY_NORMAL(), > FMODE_FSNOTIFY_CONTENT() and FMODE_FSNOTIFY_PRE_CONTENT() which evaluate to > true if we should be sending normal/content/pre-content events to the file. > With appropriate comments this should make things more obvious. > ok, maybe something like this: #define FMODE_FSNOTIFY_NONE(mode) \ (FMODE_FSNOTIFY(mode) == FMODE_NONOTIFY) #define FMODE_FSNOTIFY_NORMAL(mode) \ (FMODE_FSNOTIFY(mode) == FMODE_NONOTIFY_PERM) #define FMODE_FSNOTIFY_PERM(mode) \ (!((mode) & FMODE_NONOTIFY_PERM)) #define FMODE_FSNOTIFY_HSM(mode) \ (FMODE_FSNOTIFY(mode) == 0) At least keeping the double negatives contained in one place. And then we have these users in the final code: static inline bool fsnotify_file_has_pre_content_watches(struct file *file) { return file && unlikely(FMODE_FSNOTIFY_HSM(file->f_mode)); } static inline int fsnotify_open_perm(struct file *file) { int ret; if (likely(!FMODE_FSNOTIFY_PERM(file->f_mode))) return 0; ... static inline int fsnotify_file(struct file *file, __u32 mask) { if (FMODE_FSNOTIFY_NONE(file->f_mode)) return 0; ... BTW, I prefer using PERM,HSM instead of the FSNOTIFY_PRIO_ names for brevity, but also because at the moment of this patch FMODE_NONOTIFY_PERM means "suppress permission events if there are no listeners with priority >= FSNOTIFY_PRIO_CONTENT at all on any objects of the filesystem". It does NOT mean that there ARE permission events watchers on the file's sb/mnt/inode or parent, but what the users of the flag care about really is whether the specific file is being watched for permission events. I was contemplating if we should add the following check at open time as following patches add for pre-content watchers also for permission watchers on the specific file: /* * Permission events is a super set of pre-content events, so if there * are no permission event watchers, there are also no pre-content event * watchers and this is implied from the single FMODE_NONOTIFY_PERM bit. */ if (likely(!fsnotify_sb_has_priority_watchers(sb, FSNOTIFY_PRIO_CONTENT))) return FMODE_NONOTIFY_PERM; + /* + * There are content watchers in the filesystem, but are there + * permission event watchers on this specific file? + */ + if (likely(!fsnotify_file_object_watched(file, + ALL_FSNOTIFY_PERM_EVENTS))) + return FMODE_NONOTIFY_PERM; + I decided not to stretch the behavior change too much and also since Anti-malware permission watchers often watch all the mounts of a filesystem, there is probably little to gain from this extra check. But we can reconsider this in the future. WDYT? In any case, IMO the language of FMODE_FSNOTIFY_PERM() matches the meaning of the users better and makes the code easier to understand. FMODE_FSNOTIFY_HSM() is debatable, but at least it is short ;) Anyway, I will send v2 with your suggestions. Thanks, Amir.