Re: [PATCH v6 09/21] fsnotify: Allow events reported with an empty inode

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Amir Goldstein <amir73il@xxxxxxxxx> writes:

> On Wed, Aug 25, 2021 at 9:40 PM Gabriel Krisman Bertazi
> <krisman@xxxxxxxxxxxxx> wrote:
>>
>> Amir Goldstein <amir73il@xxxxxxxxx> writes:
>>
>> > On Fri, Aug 13, 2021 at 12:41 AM Gabriel Krisman Bertazi
>> > <krisman@xxxxxxxxxxxxx> wrote:
>> >>
>> >> Some file system events (i.e. FS_ERROR) might not be associated with an
>> >> inode.  For these, it makes sense to associate them directly with the
>> >> super block of the file system they apply to.  This patch allows the
>> >> event to be reported with a NULL inode, by recovering the superblock
>> >> directly from the data field, if needed.
>> >>
>> >> Signed-off-by: Gabriel Krisman Bertazi <krisman@xxxxxxxxxxxxx>
>> >>
>> >> --
>> >> Changes since v5:
>> >>   - add fsnotify_data_sb handle to retrieve sb from the data field. (jan)
>> >> ---
>> >>  fs/notify/fsnotify.c | 16 +++++++++++++---
>> >>  1 file changed, 13 insertions(+), 3 deletions(-)
>> >>
>> >> diff --git a/fs/notify/fsnotify.c b/fs/notify/fsnotify.c
>> >> index 30d422b8c0fc..536db02cb26e 100644
>> >> --- a/fs/notify/fsnotify.c
>> >> +++ b/fs/notify/fsnotify.c
>> >> @@ -98,6 +98,14 @@ void fsnotify_sb_delete(struct super_block *sb)
>> >>         fsnotify_clear_marks_by_sb(sb);
>> >>  }
>> >>
>> >> +static struct super_block *fsnotify_data_sb(const void *data, int data_type)
>> >> +{
>> >> +       struct inode *inode = fsnotify_data_inode(data, data_type);
>> >> +       struct super_block *sb = inode ? inode->i_sb : NULL;
>> >> +
>> >> +       return sb;
>> >> +}
>> >> +
>> >>  /*
>> >>   * Given an inode, first check if we care what happens to our children.  Inotify
>> >>   * and dnotify both tell their parents about events.  If we care about any event
>> >> @@ -455,8 +463,10 @@ static void fsnotify_iter_next(struct fsnotify_iter_info *iter_info)
>> >>   *             @file_name is relative to
>> >>   * @file_name: optional file name associated with event
>> >>   * @inode:     optional inode associated with event -
>> >> - *             either @dir or @inode must be non-NULL.
>> >> - *             if both are non-NULL event may be reported to both.
>> >> + *             If @dir and @inode are NULL, @data must have a type that
>> >> + *             allows retrieving the file system associated with this
>> >
>> > Irrelevant comment. sb must always be available from @data.
>> >
>> >> + *             event.  if both are non-NULL event may be reported to
>> >> + *             both.
>> >>   * @cookie:    inotify rename cookie
>> >>   */
>> >>  int fsnotify(__u32 mask, const void *data, int data_type, struct inode *dir,
>> >> @@ -483,7 +493,7 @@ int fsnotify(__u32 mask, const void *data, int data_type, struct inode *dir,
>> >>                  */
>> >>                 parent = dir;
>> >>         }
>> >> -       sb = inode->i_sb;
>> >> +       sb = inode ? inode->i_sb : fsnotify_data_sb(data, data_type);
>> >
>> >         const struct path *path = fsnotify_data_path(data, data_type);
>> > +       const struct super_block *sb = fsnotify_data_sb(data, data_type);
>> >
>> > All the games with @data @inode and @dir args are irrelevant to this.
>> > sb should always be available from @data and it does not matter
>> > if fsnotify_data_inode() is the same as @inode, @dir or neither.
>> > All those inodes are anyway on the same sb.
>>
>> Hi Amir,
>>
>> I think this is actually necessary.  I could identify at least one event
>> (FS_CREATE | FS_ISDIR) where fsnotify is invoked with a NULL data field.
>> In that case, fsnotify_dirent is called with a negative dentry from
>> vfs_mkdir().  I'm not sure why exactly the dentry is negative after the
>
> That doesn't sound right at all.
> Are you sure about this?
> Which filesystem was this mkdir called on?

You should be able to reproduce it on top of mainline if you pick only this
patch and do the change you suggested:

 -       sb = inode->i_sb;
 +       sb = fsnotify_data_sb(data, data_type);

And then boot a Debian stable with systemd.  The notification happens on
the cgroup pseudo-filesystem (/sys/fs/cgroup), which is being monitored
by systemd itself.  The event that arrives with a NULL data is telling the
directory /sys/fs/cgroup/*/ about the creation of directory
`init.scope`.

The change above triggers the following null dereference of struct
super_block, since data is NULL.

I will keep looking but you might be able to answer it immediately...

fsnotify was called with:
  data_type=2
  mask=40000100
  data=0
  name=init.scope

The code looks like this:

        fsnotify_mkdir(dir, dentry) {
       	  fsnotify_dirent(inode, dentry, FS_CREATE | FS_ISDIR) {
	    fsnotify_name(dir, mask, d_inode(dentry), &dentry->d_name, 0) {
	      fsnotify(mask, child, FSNOTIFY_EVENT_INODE, dir, name, NULL, cookie);
	    }
	  }
	}

The entire log:

BUG: kernel NULL pointer dereference, address: 00000000000003a8
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 0 P4D 0 
Oops: 0000 [#1] SMP PTI
CPU: 3 PID: 1 Comm: systemd Not tainted 5.14.0-rc5- #279
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
RIP: 0010:fsnotify+0x103/0x5e0
Code: 84 c2 04 00 00 83 f8 02 0f 85 c6 04 00 00 48 8b 04 24 31 db 48 85 c0 0f 84 b9 04 00 00 48 8b 48 28 48 85 c9 0f 84 ac 04 00 00 <48> 83 b9 a8 03 00 00 00 0f 84 e3 03 00 00 8b 81 a0 03 00 00 48 85
RSP: 0018:ffffaff800013e18 EFLAGS: 00010246
RAX: 0000000000000026 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffffaff800013ca8 RDI: 0000000000000000
RBP: 0000000000000000 R08: 0000000000000000 R09: c0000000ffffefff
R10: 0000000000000001 R11: ffffaff800013c48 R12: ffff9bbb80467778
R13: 0000000040000100 R14: 00000000000001ed R15: 0000000000000000
FS:  00007f2e4d2c4900(0000) GS:ffff9bbbbbd80000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000000003a8 CR3: 0000000100054000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 ? cgroup_kn_unlock+0x33/0x80
 ? cgroup_mkdir+0x13e/0x410
 vfs_mkdir+0x16e/0x1d0
 do_mkdirat+0x8c/0x100
 do_syscall_64+0x3a/0x80
 entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f2e4da91b07
Code: 1f 40 00 48 8b 05 89 f3 0c 00 64 c7 00 5f 00 00 00 b8 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 b8 53 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 59 f3 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffc02877ad8 EFLAGS: 00000202 ORIG_RAX: 0000000000000053
RAX: ffffffffffffffda RBX: 00007ffc02877b50 RCX: 00007f2e4da91b07
RDX: 00007ffc02877980 RSI: 00000000000001ed RDI: 00005646df5f01d0
RBP: 00005646de9ed770 R08: 0000000000000001 R09: 0000000000000000
R10: 00005646df5f01c0 R11: 0000000000000202 R12: 0000000000000000
R13: 0000000000000000 R14: 00007ffc02877b50 R15: 00007ffc02877c60
Modules linked in:
CR2: 00000000000003a8
---[ end trace 4642e1d1df9669cb ]---

>
>> mkdir, but it would be possible we are racing with the file removal, I
>
> No. Both vfs_mkdir() and vfs_rmdir() hold the dir inode lock (on
> parent).
>
>> guess?  It might be a bug in fsnotify that this case even happen, but
>> I'm not sure yet.
>
> fsnotify_data_inode() should not be NULL.
> fsnotify_handle_inode_event() passes fsnotify_data_inode() to
> callbacks like audit_watch_handle_event() that checks
> WARN_ON_ONCE(!inode)
>
>>
>> The easiest way around it is what I proposed: just use inode->i_sb if
>> data is NULL.  Since, as you said, data, inode and dir are all on the
>> same superblock, it should work, I think.
>>
>
> It would be papering over another issue.
> I don't mind if we use inode->i_sb as long as we understand the reason
> for what you are reporting.
>
> Please provide more information.



-- 
Gabriel Krisman Bertazi



[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux