Re: [PATCH] cifs: missing null pointer check in cifs_mount

Jeff Layton <jlayton@xxxxxxxxxx> · Fri, 11 Aug 2023 12:26:55 -0400

On Fri, 2023-08-11 at 12:15 -0300, Paulo Alcantara wrote:
> Jeff Layton <jlayton@xxxxxxxxxx> writes:
> 
> > On Wed, 2021-06-23 at 19:34 -0500, Steve French wrote:
> > > updated patch attached with Aurelien's suggestion.
> > > 
> > > On Wed, Jun 23, 2021 at 7:17 AM Paulo Alcantara <pc@xxxxxx> wrote:
> > > > 
> > > > Agreed.
> > > > 
> > > > On June 23, 2021 8:48:24 AM GMT-03:00, "Aurélien Aptel" <aaptel@xxxxxxxx> wrote:
> > > > > Steve French <smfrench@xxxxxxxxx> writes:
> > > > > > We weren't checking if tcon is null before setting dfs path,
> > > > > > although we check for null tcon in an earlier assignment statement.
> > > > > 
> > > > > If tcon is NULL there is no point in continuing in that function, we
> > > > > should have exited earlier.
> > > > > 
> > > > > If tcon is NULL it means mount_get_conns() failed so presumably rc will
> > > > > be != 0 and we would goto error.
> > > > > 
> > > > > I don't think this is needed. We could change the existing check after
> > > > > the loop to this you really want to be safe:
> > > > > 
> > > > >       if (rc || !tcon)
> > > > >               goto error;
> > > > > 
> > > > > 
> > > > > Cheers,
> > > 
> > > 
> > > 
> > 
> > I know this patch is ancient and the mainline code has marched on, but
> > it seems really suspicious to me.
> 
> Yes, it is.
> 
> > With this, we have cifs_mount returning 0, even though the superblock
> > hasn't been properly initialized. Is that expected? Shouldn't it return
> > an error in that case?
> 
> No, that isn't expected.  And yes, if @tcon would ever be NULL at that
> point, we should be returning an error instead.  Otherwise we'd end up
> dereferencing a NULL @tcon while trying to get an inode for the root
> dentry later.
> 
> However, by quickly looking at the old code -- on top of 162004a2f7ef --
> I don't see how we'd end up having a NULL @tcon with rc == 0 as
> mount_get_conns() would return -errno if it couldn't get a tcon.  Please
> correct me if I'm missing something.  Whether it is possibile or not,
> the NULL @tcon check is certainly missing a 'rc = -ENOENT' or some other
> error before bailing out as you've pointed out.

Thanks for the confirmation. There were some oopses on some RHEL8 (5.14
based kernels). The stack looked something like this:

PID: 2415716  TASK: ffff937139090000  CPU: 3    COMMAND: "ls"
 #0 [ffff9ef946b23728] machine_kexec at ffffffffac867cfe
 #1 [ffff9ef946b23780] __crash_kexec at ffffffffac9ad94d
 #2 [ffff9ef946b23848] crash_kexec at ffffffffac9ae881
 #3 [ffff9ef946b23860] oops_end at ffffffffac8274f1
 #4 [ffff9ef946b23880] no_context at ffffffffac879a03
 #5 [ffff9ef946b238d8] __bad_area_nosemaphore at ffffffffac879d64
 #6 [ffff9ef946b23920] do_page_fault at ffffffffac87a617
 #7 [ffff9ef946b23950] page_fault at ffffffffad20111e
    [exception RIP: cifs_mount+1126]
    RIP: ffffffffc08e8826  RSP: ffff9ef946b23a00  RFLAGS: 00010246
    RAX: 0000000000000000  RBX: ffff936c8221ea00  RCX: ffff936f8018b320
    RDX: 0000000000000001  RSI: ffff936f8018b420  RDI: ffff936c8221ea00
    RBP: ffff9ef946b23a90   R8: 5346445756535c5c   R9: 6765642e50313031
    R10: 65622e666f6f7267  R11: 0063696c6275505c  R12: ffff9370b192fc00
    R13: ffff936f8018b420  R14: 00000000003097ad  R15: 0000000000000000
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #8 [ffff9ef946b23a98] cifs_smb3_do_mount at ffffffffc08d11f2 [cifs]
 #9 [ffff9ef946b23aa0] cifs_smb3_do_mount at ffffffffc08d11f2 [cifs]
#10 [ffff9ef946b23b08] smb3_get_tree at ffffffffc0930ae0 [cifs]
#11 [ffff9ef946b23b30] vfs_get_tree at ffffffffacb52365
#12 [ffff9ef946b23b50] fc_mount at ffffffffacb7485e
#13 [ffff9ef946b23b60] vfs_kern_mount at ffffffffacb748ec
#14 [ffff9ef946b23b80] cifs_dfs_do_automount at ffffffffc093552e [cifs]
#15 [ffff9ef946b23bc0] cifs_dfs_d_automount at ffffffffc0935880 [cifs]
#16 [ffff9ef946b23bd0] follow_managed at ffffffffacb5bdaf
#17 [ffff9ef946b23c10] lookup_fast at ffffffffacb5c7e5
#18 [ffff9ef946b23c68] walk_component at ffffffffacb5d258
#19 [ffff9ef946b23cc8] path_lookupat at ffffffffacb5e215
#20 [ffff9ef946b23d28] filename_lookup at ffffffffacb62710
#21 [ffff9ef946b23e40] vfs_statx at ffffffffacb55874
#22 [ffff9ef946b23e98] __do_sys_statx at ffffffffacb5692b
#23 [ffff9ef946b23f38] do_syscall_64 at ffffffffac8043ab
#24 [ffff9ef946b23f50] entry_SYSCALL_64_after_hwframe at
ffffffffad2000a9
    RIP: 00007ff6a2637edf  RSP: 00007ffe040017d0  RFLAGS: 00000246
    RAX: ffffffffffffffda  RBX: 00007ffe04001910  RCX: 00007ff6a2637edf
    RDX: 0000000000000100  RSI: 00007ffe04001910  RDI: 00000000ffffff9c
    RBP: 0000000000000100   R8: 00007ffe040017f0   R9: 00000000ffffff9c
    R10: 0000000000000002  R11: 0000000000000246  R12: 00007ffe040017f0
    R13: 0000000000000000  R14: 0000000000000003  R15: 000055ce1a3ae1b8
    ORIG_RAX: 000000000000014c  CS: 0033  SS: 002b

Analysis of the vmcore by Roberto showed that we had ended up past that
point with tcon==NULL and rc==0.

Steve's patch would have fixed the panic there, but I think the host
would have ended up with a successful mount, but with a broken
superblock. The current code seems a bit less fragile, and I didn't see
any similar brokenness there, but I didn't look too hard either.

In any case, we'll plan to fix this up with a one-off in RHEL/Centos.
Thanks again for the sanity check!

-- 
Jeff Layton <jlayton@xxxxxxxxxx>