On Sun, Mar 28, 2021 at 06:56:24PM +0300, Amir Goldstein wrote: > Add a high level hook fsnotify_path_create() which is called from > syscall context where mount context is available, so that FAN_CREATE > event can be added to a mount mark mask. > > This high level hook is called in addition to fsnotify_create(), > fsnotify_mkdir() and fsnotify_link() hooks in vfs helpers where the mount > context is not available. > > In the context where fsnotify_path_create() will be called, a dentry flag > flag is set on the new dentry the suppress the FS_CREATE event in the vfs > level hooks. Ok, just to make sure this scheme would also work for overlay-style filesystems like ecryptfs where you possible generate two notify events: - in the ecryptfs layer - in the lower fs layer at least when you set a regular inode watch. If you set a mount watch you ideally would generate two events in both layers too, right? But afaict that wouldn't work. Say, someone creates a new link in ecryptfs the DENTRY_PATH_CREATE flag will be set on the new ecryptfs dentry and so no notify event will be generated for the ecryptfs layer again. Then ecryptfs calls vfs_link() to create a new dentry in the lower layer. The new dentry in the lower layer won't have DCACHE_PATH_CREATE set. Ok, that makes sense. But since vfs_link() doesn't have access to the mnt context itself you can't generate a notify event for the mount associated with the lower fs. This would cause people who a FAN_MARK_MOUNT watch on that lower fs mount to not get notified about creation events going through the ecryptfs layer. Is that right? Seems like this could be a problem. Christian > > This functionality was requested by Christian Brauner to replace > recursive inotify watches for detecting when some path was created under > an idmapped mount without having to monitor FAN_CREATE events in the > entire filesystem. > > In combination with more changes to allow unprivileged fanotify listener > to watch an idmapped mount, this functionality would be usable also by > nested container managers. > > Link: https://lore.kernel.org/linux-fsdevel/20210318143140.jxycfn3fpqntq34z@wittgenstein/ > Cc: Christian Brauner <christian.brauner@xxxxxxxxxx> > Signed-off-by: Amir Goldstein <amir73il@xxxxxxxxx> > --- > > Jan, > > After trying several different approaches, I finally realized that > making FAN_CREATE available for mount marks is not that hard and it could > be very useful IMO. > > Adding support for other "inode events" with mount mark, such as > FAN_ATTRIB, FAN_DELETE, FAN_MOVE may also be possible, but adding support > for FAN_CREATE was really easy due to the fact that all call sites are > already surrounded by filename_creat()/done_path_create() calls. > > Also, there is an inherent a-symetry between FAN_CREATE and other > events. All the rest of the events may be set when watching a postive > path, for example, to know when a path of a bind mount that was > "injected" to a container was moved or deleted, it is possible to start > watching that directory before injecting the bind mount. > > It is not possible to do the same with a "negative" path to know when > a positive dentry was instantiated at that path. > > This patch provides functionality that is independant of other changes, > but I also tested it along with other changes that demonstrate how it > would be utilized in userns setups [1][2]. > > As can be seen in dcache.h patch, this patch comes on top a revert patch > to reclaim an unused dentry flag. If you accept this proposal, I will > post the full series. > > Thanks, > Amir. > > [1] https://github.com/amir73il/linux/commits/fanotify_userns > [2] https://github.com/amir73il/inotify-tools/commits/fanotify_userns > > fs/namei.c | 21 ++++++++++++++++++++- > include/linux/dcache.h | 2 +- > include/linux/fanotify.h | 8 ++++---- > include/linux/fsnotify.h | 36 ++++++++++++++++++++++++++++++++++++ > 4 files changed, 61 insertions(+), 6 deletions(-) > > diff --git a/fs/namei.c b/fs/namei.c > index 216f16e74351..cf979e956938 100644 > --- a/fs/namei.c > +++ b/fs/namei.c > @@ -3288,7 +3288,7 @@ static const char *open_last_lookups(struct nameidata *nd, > inode_lock_shared(dir->d_inode); > dentry = lookup_open(nd, file, op, got_write); > if (!IS_ERR(dentry) && (file->f_mode & FMODE_CREATED)) > - fsnotify_create(dir->d_inode, dentry); > + fsnotify_path_create(&nd->path, dentry); > if (open_flag & O_CREAT) > inode_unlock(dir->d_inode); > else > @@ -3560,6 +3560,20 @@ struct file *do_file_open_root(struct dentry *dentry, struct vfsmount *mnt, > return file; > } > > +static void d_set_path_create(struct dentry *dentry) > +{ > + spin_lock(&dentry->d_lock); > + dentry->d_flags |= DCACHE_PATH_CREATE; > + spin_unlock(&dentry->d_lock); > +} > + > +static void d_clear_path_create(struct dentry *dentry) > +{ > + spin_lock(&dentry->d_lock); > + dentry->d_flags &= ~DCACHE_PATH_CREATE; > + spin_unlock(&dentry->d_lock); > +} > + > static struct dentry *filename_create(int dfd, struct filename *name, > struct path *path, unsigned int lookup_flags) > { > @@ -3617,6 +3631,8 @@ static struct dentry *filename_create(int dfd, struct filename *name, > goto fail; > } > putname(name); > + /* Start "path create" context that ends in done_path_create() */ > + d_set_path_create(dentry); > return dentry; > fail: > dput(dentry); > @@ -3641,6 +3657,9 @@ EXPORT_SYMBOL(kern_path_create); > > void done_path_create(struct path *path, struct dentry *dentry) > { > + if (d_inode(dentry)) > + fsnotify_path_create(path, dentry); > + d_clear_path_create(dentry); > dput(dentry); > inode_unlock(path->dentry->d_inode); > mnt_drop_write(path->mnt); > diff --git a/include/linux/dcache.h b/include/linux/dcache.h > index 4225caa8cf02..d153793d5b95 100644 > --- a/include/linux/dcache.h > +++ b/include/linux/dcache.h > @@ -213,7 +213,7 @@ struct dentry_operations { > #define DCACHE_SYMLINK_TYPE 0x00600000 /* Symlink (or fallthru to such) */ > > #define DCACHE_MAY_FREE 0x00800000 > -/* Was #define DCACHE_FALLTHRU 0x01000000 */ Indeed, that seems completely unused.