On Thu, Aug 26, 2021 at 12:50 AM Gabriel Krisman Bertazi <krisman@xxxxxxxxxxxxx> wrote: > > Amir Goldstein <amir73il@xxxxxxxxx> writes: > > > On Wed, Aug 25, 2021 at 9:40 PM Gabriel Krisman Bertazi > > <krisman@xxxxxxxxxxxxx> wrote: > >> > >> Amir Goldstein <amir73il@xxxxxxxxx> writes: > >> > >> > On Fri, Aug 13, 2021 at 12:41 AM Gabriel Krisman Bertazi > >> > <krisman@xxxxxxxxxxxxx> wrote: > >> >> > >> >> Some file system events (i.e. FS_ERROR) might not be associated with an > >> >> inode. For these, it makes sense to associate them directly with the > >> >> super block of the file system they apply to. This patch allows the > >> >> event to be reported with a NULL inode, by recovering the superblock > >> >> directly from the data field, if needed. > >> >> > >> >> Signed-off-by: Gabriel Krisman Bertazi <krisman@xxxxxxxxxxxxx> > >> >> > >> >> -- > >> >> Changes since v5: > >> >> - add fsnotify_data_sb handle to retrieve sb from the data field. (jan) > >> >> --- > >> >> fs/notify/fsnotify.c | 16 +++++++++++++--- > >> >> 1 file changed, 13 insertions(+), 3 deletions(-) > >> >> > >> >> diff --git a/fs/notify/fsnotify.c b/fs/notify/fsnotify.c > >> >> index 30d422b8c0fc..536db02cb26e 100644 > >> >> --- a/fs/notify/fsnotify.c > >> >> +++ b/fs/notify/fsnotify.c > >> >> @@ -98,6 +98,14 @@ void fsnotify_sb_delete(struct super_block *sb) > >> >> fsnotify_clear_marks_by_sb(sb); > >> >> } > >> >> > >> >> +static struct super_block *fsnotify_data_sb(const void *data, int data_type) > >> >> +{ > >> >> + struct inode *inode = fsnotify_data_inode(data, data_type); > >> >> + struct super_block *sb = inode ? inode->i_sb : NULL; > >> >> + > >> >> + return sb; > >> >> +} > >> >> + > >> >> /* > >> >> * Given an inode, first check if we care what happens to our children. Inotify > >> >> * and dnotify both tell their parents about events. If we care about any event > >> >> @@ -455,8 +463,10 @@ static void fsnotify_iter_next(struct fsnotify_iter_info *iter_info) > >> >> * @file_name is relative to > >> >> * @file_name: optional file name associated with event > >> >> * @inode: optional inode associated with event - > >> >> - * either @dir or @inode must be non-NULL. > >> >> - * if both are non-NULL event may be reported to both. > >> >> + * If @dir and @inode are NULL, @data must have a type that > >> >> + * allows retrieving the file system associated with this > >> > > >> > Irrelevant comment. sb must always be available from @data. > >> > > >> >> + * event. if both are non-NULL event may be reported to > >> >> + * both. > >> >> * @cookie: inotify rename cookie > >> >> */ > >> >> int fsnotify(__u32 mask, const void *data, int data_type, struct inode *dir, > >> >> @@ -483,7 +493,7 @@ int fsnotify(__u32 mask, const void *data, int data_type, struct inode *dir, > >> >> */ > >> >> parent = dir; > >> >> } > >> >> - sb = inode->i_sb; > >> >> + sb = inode ? inode->i_sb : fsnotify_data_sb(data, data_type); > >> > > >> > const struct path *path = fsnotify_data_path(data, data_type); > >> > + const struct super_block *sb = fsnotify_data_sb(data, data_type); > >> > > >> > All the games with @data @inode and @dir args are irrelevant to this. > >> > sb should always be available from @data and it does not matter > >> > if fsnotify_data_inode() is the same as @inode, @dir or neither. > >> > All those inodes are anyway on the same sb. > >> > >> Hi Amir, > >> > >> I think this is actually necessary. I could identify at least one event > >> (FS_CREATE | FS_ISDIR) where fsnotify is invoked with a NULL data field. > >> In that case, fsnotify_dirent is called with a negative dentry from > >> vfs_mkdir(). I'm not sure why exactly the dentry is negative after the > > > > That doesn't sound right at all. > > Are you sure about this? > > Which filesystem was this mkdir called on? > > You should be able to reproduce it on top of mainline if you pick only this > patch and do the change you suggested: > > - sb = inode->i_sb; > + sb = fsnotify_data_sb(data, data_type); > > And then boot a Debian stable with systemd. The notification happens on > the cgroup pseudo-filesystem (/sys/fs/cgroup), which is being monitored > by systemd itself. The event that arrives with a NULL data is telling the > directory /sys/fs/cgroup/*/ about the creation of directory > `init.scope`. > > The change above triggers the following null dereference of struct > super_block, since data is NULL. > > I will keep looking but you might be able to answer it immediately... Yes, I see what is going on. cgroupfs is a sort of kernfs and kernfs_iop_mkdir() does not instantiate the negative dentry. Instead, kernfs_dop_revalidate() always invalidates negative dentries to force re-lookup to find the inode. Documentation/filesystems/vfs.rst says on create() and friends: "...you will probably call d_instantiate() with the dentry and the newly created inode..." So this behavior seems legit. Meaning that we have made a wrong assumption in fsnotify_create() and fsnotify_mkdir(). Please note the comment above fsnotify_link() which anticipates negative dentries. I've audited the fsnotify backends and it seems that the WARN_ON(!inode) in kernel/audit_* is the only immediate implication of negative dentry with FS_CREATE. I am the one who added these WARN_ON(), so I will remove them. I think that missing inode in an FS_CREATE event really breaks audit on kernfs, but not sure if that is a valid use case (Paul?). Anyway, regarding your patch, I still prefer the solution proposed by Jan, but not with a different implementation of fsnotify_data_sb(). Please see branch fsnotify_data_sb[1] with the proposed fixes. The fixes assert the statement that "sb should always be available from @data", regardless of kernfs anomaly. If this works for you, please prepend those patches to your next submission. Regarding the state of this patch set in general, I must admit that I wasn't able to follow if a conclusion was reached about the lifetime management of fsnotify_error_event and associated sb mark. Jan is going out on vacation and I think there is little point in spinning another patch set revision before this issue is settled with Jan. Thanks, Amir. [1] https://github.com/amir73il/linux/commits/fsnotify_data_sb