On Wed 20-11-24 17:12:21, Amir Goldstein wrote: > On Wed, Nov 20, 2024 at 4:53 PM Jan Kara <jack@xxxxxxx> wrote: > > > > On Fri 15-11-24 10:30:15, Josef Bacik wrote: > > > From: Amir Goldstein <amir73il@xxxxxxxxx> > > > > > > Legacy inotify/fanotify listeners can add watches for events on inode, > > > parent or mount and expect to get events (e.g. FS_MODIFY) on files that > > > were already open at the time of setting up the watches. > > > > > > fanotify permission events are typically used by Anti-malware sofware, > > > that is watching the entire mount and it is not common to have more that > > > one Anti-malware engine installed on a system. > > > > > > To reduce the overhead of the fsnotify_file_perm() hooks on every file > > > access, relax the semantics of the legacy FAN_ACCESS_PERM event to generate > > > events only if there were *any* permission event listeners on the > > > filesystem at the time that the file was opened. > > > > > > The new semantic is implemented by extending the FMODE_NONOTIFY bit into > > > two FMODE_NONOTIFY_* bits, that are used to store a mode for which of the > > > events types to report. > > > > > > This is going to apply to the new fanotify pre-content events in order > > > to reduce the cost of the new pre-content event vfs hooks. > > > > > > Suggested-by: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> > > > Link: https://lore.kernel.org/linux-fsdevel/CAHk-=wj8L=mtcRTi=NECHMGfZQgXOp_uix1YVh04fEmrKaMnXA@xxxxxxxxxxxxxx/ > > > Signed-off-by: Amir Goldstein <amir73il@xxxxxxxxx> > > > > FWIW I've ended up somewhat massaging this patch (see below). > > > > > diff --git a/include/linux/fs.h b/include/linux/fs.h > > > index 23bd058576b1..8e5c783013d2 100644 > > > --- a/include/linux/fs.h > > > +++ b/include/linux/fs.h > > > @@ -173,13 +173,14 @@ typedef int (dio_iodone_t)(struct kiocb *iocb, loff_t offset, > > > > > > #define FMODE_NOREUSE ((__force fmode_t)(1 << 23)) > > > > > > -/* FMODE_* bit 24 */ > > > - > > > /* File is embedded in backing_file object */ > > > -#define FMODE_BACKING ((__force fmode_t)(1 << 25)) > > > +#define FMODE_BACKING ((__force fmode_t)(1 << 24)) > > > > > > -/* File was opened by fanotify and shouldn't generate fanotify events */ > > > -#define FMODE_NONOTIFY ((__force fmode_t)(1 << 26)) > > > +/* File shouldn't generate fanotify pre-content events */ > > > +#define FMODE_NONOTIFY_HSM ((__force fmode_t)(1 << 25)) > > > + > > > +/* File shouldn't generate fanotify permission events */ > > > +#define FMODE_NONOTIFY_PERM ((__force fmode_t)(1 << 26)) > > > > Firstly, I've kept FMODE_NONOTIFY to stay a single bit instead of two bit > > constant. I've seen too many bugs caused by people expecting the constant > > has a single bit set when it actually had more in my life. So I've ended up > > with: > > > > +/* > > + * Together with FMODE_NONOTIFY_PERM defines which fsnotify events shouldn't be > > + * generated (see below) > > + */ > > +#define FMODE_NONOTIFY ((__force fmode_t)(1 << 25)) > > + > > +/* > > + * Together with FMODE_NONOTIFY defines which fsnotify events shouldn't be > > + * generated (see below) > > + */ > > +#define FMODE_NONOTIFY_PERM ((__force fmode_t)(1 << 26)) > > > > and > > > > +/* > > + * The two FMODE_NONOTIFY* define which fsnotify events should not be generated > > + * for a file. These are the possible values of (f->f_mode & > > + * FMODE_FSNOTIFY_MASK) and their meaning: > > + * > > + * FMODE_NONOTIFY - suppress all (incl. non-permission) events. > > + * FMODE_NONOTIFY_PERM - suppress permission (incl. pre-content) events. > > + * FMODE_NONOTIFY | FMODE_NONOTIFY_PERM - suppress only pre-content events. > > + */ > > +#define FMODE_FSNOTIFY_MASK \ > > + (FMODE_NONOTIFY | FMODE_NONOTIFY_PERM) > > + > > +#define FMODE_FSNOTIFY_NONE(mode) \ > > + ((mode & FMODE_FSNOTIFY_MASK) == FMODE_NONOTIFY) > > +#define FMODE_FSNOTIFY_PERM(mode) \ > > + (!(mode & FMODE_NONOTIFY_PERM)) > > That looks incorrect - > It gives the wrong value for FMODE_NONOTIFY | FMODE_NONOTIFY_PERM > > should be: > != FMODE_NONOTIFY_PERM && > != FMODE_NONOTIFY > > The simplicity of the single bit test is for permission events > is why I chose my model, but I understand your reasoning. Ah, thanks for catching this! I've fixed this to: +#define FMODE_FSNOTIFY_PERM(mode) \ + ((mode & FMODE_FSNOTIFY_MASK) == 0 || \ + (mode & FMODE_FSNOTIFY_MASK) == (FMODE_NONOTIFY | FMODE_NONOTIFY_PERM)) It is not a single bit test so it ends up being: 0x0000000060180345 <+101>: mov 0x20(%r12),%edx 0x000000006018034a <+106>: and $0x6000000,%edx 0x0000000060180350 <+112>: je 0x6018035a <rw_verify_area+122> 0x0000000060180352 <+114>: cmp $0x6000000,%edx 0x0000000060180358 <+120>: jne 0x6018032e <rw_verify_area+78> But I guess that's not terrible either. > > +#define FMODE_FSNOTIFY_HSM(mode) \ > > + ((mode & FMODE_FSNOTIFY_MASK) == 0) > > > > Also I've moved file_set_fsnotify_mode() out of line into fsnotify.c. The > > function gets quite big and the call is not IMO so expensive to warrant > > inlining. Furthermore it saves exporting some fsnotify internals to modules > > (in later patches). > > Sounds good. > Since you wanted to refrain from defining a two bit constant, > I wonder how you annotated for NONOTIFY_HSM case > > return FMODE_NONOTIFY | FMODE_NONOTIFY_PERM; I'm not sure I understand. What do you mean by "annotated"? It is not that I object to "two bit constants". FMODE_FSNOTIFY_MASK is a two-bit constant and a good one. But the name clearly suggests it is not a single bit constant. When you have all FMODE_FOO and FMODE_BAR things single bit except for FMODE_BAZ which is multi-bit, then this is IMHO a recipe for problems and I rather prefer explicitely spelling the combination out as FMODE_NONOTIFY | FMODE_NONOTIFY_PERM in the few places that need this instead of hiding it behind some other name. Honza -- Jan Kara <jack@xxxxxxxx> SUSE Labs, CR