On Fri, 2023-08-11 at 12:58 -0400, Jeff Layton wrote: > On Fri, 2023-08-11 at 13:49 -0300, Paulo Alcantara wrote: > > Jeff Layton <jlayton@xxxxxxxxxx> writes: > > > > > Thanks for the confirmation. There were some oopses on some RHEL8 (5.14 > > > based kernels). The stack looked something like this: > > > > > > PID: 2415716 TASK: ffff937139090000 CPU: 3 COMMAND: "ls" > > > #0 [ffff9ef946b23728] machine_kexec at ffffffffac867cfe > > > #1 [ffff9ef946b23780] __crash_kexec at ffffffffac9ad94d > > > #2 [ffff9ef946b23848] crash_kexec at ffffffffac9ae881 > > > #3 [ffff9ef946b23860] oops_end at ffffffffac8274f1 > > > #4 [ffff9ef946b23880] no_context at ffffffffac879a03 > > > #5 [ffff9ef946b238d8] __bad_area_nosemaphore at ffffffffac879d64 > > > #6 [ffff9ef946b23920] do_page_fault at ffffffffac87a617 > > > #7 [ffff9ef946b23950] page_fault at ffffffffad20111e > > > [exception RIP: cifs_mount+1126] > > > RIP: ffffffffc08e8826 RSP: ffff9ef946b23a00 RFLAGS: 00010246 > > > RAX: 0000000000000000 RBX: ffff936c8221ea00 RCX: ffff936f8018b320 > > > RDX: 0000000000000001 RSI: ffff936f8018b420 RDI: ffff936c8221ea00 > > > RBP: ffff9ef946b23a90 R8: 5346445756535c5c R9: 6765642e50313031 > > > R10: 65622e666f6f7267 R11: 0063696c6275505c R12: ffff9370b192fc00 > > > R13: ffff936f8018b420 R14: 00000000003097ad R15: 0000000000000000 > > > ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 > > > #8 [ffff9ef946b23a98] cifs_smb3_do_mount at ffffffffc08d11f2 [cifs] > > > #9 [ffff9ef946b23aa0] cifs_smb3_do_mount at ffffffffc08d11f2 [cifs] > > > #10 [ffff9ef946b23b08] smb3_get_tree at ffffffffc0930ae0 [cifs] > > > #11 [ffff9ef946b23b30] vfs_get_tree at ffffffffacb52365 > > > #12 [ffff9ef946b23b50] fc_mount at ffffffffacb7485e > > > #13 [ffff9ef946b23b60] vfs_kern_mount at ffffffffacb748ec > > > #14 [ffff9ef946b23b80] cifs_dfs_do_automount at ffffffffc093552e [cifs] > > > #15 [ffff9ef946b23bc0] cifs_dfs_d_automount at ffffffffc0935880 [cifs] > > > #16 [ffff9ef946b23bd0] follow_managed at ffffffffacb5bdaf > > > #17 [ffff9ef946b23c10] lookup_fast at ffffffffacb5c7e5 > > > #18 [ffff9ef946b23c68] walk_component at ffffffffacb5d258 > > > #19 [ffff9ef946b23cc8] path_lookupat at ffffffffacb5e215 > > > #20 [ffff9ef946b23d28] filename_lookup at ffffffffacb62710 > > > #21 [ffff9ef946b23e40] vfs_statx at ffffffffacb55874 > > > #22 [ffff9ef946b23e98] __do_sys_statx at ffffffffacb5692b > > > #23 [ffff9ef946b23f38] do_syscall_64 at ffffffffac8043ab > > > #24 [ffff9ef946b23f50] entry_SYSCALL_64_after_hwframe at > > > ffffffffad2000a9 > > > RIP: 00007ff6a2637edf RSP: 00007ffe040017d0 RFLAGS: 00000246 > > > RAX: ffffffffffffffda RBX: 00007ffe04001910 RCX: 00007ff6a2637edf > > > RDX: 0000000000000100 RSI: 00007ffe04001910 RDI: 00000000ffffff9c > > > RBP: 0000000000000100 R8: 00007ffe040017f0 R9: 00000000ffffff9c > > > R10: 0000000000000002 R11: 0000000000000246 R12: 00007ffe040017f0 > > > R13: 0000000000000000 R14: 0000000000000003 R15: 000055ce1a3ae1b8 > > > ORIG_RAX: 000000000000014c CS: 0033 SS: 002b > > > > > > Analysis of the vmcore by Roberto showed that we had ended up past that > > > point with tcon==NULL and rc==0. > > > > Interesting. Thanks for sharing the backtrace! > > > > So it actually ended up with NULL @tcon and rc == 0 while mounting a DFS > > link. Nice catch! > > > > > Steve's patch would have fixed the panic there, but I think the host > > > would have ended up with a successful mount, but with a broken > > > superblock. The current code seems a bit less fragile, and I didn't see > > > any similar brokenness there, but I didn't look too hard either. > > > > Yeah, makes sense. > > > > > In any case, we'll plan to fix this up with a one-off in RHEL/Centos. > > > Thanks again for the sanity check! > > > > Would you mind to propose a patch that fixes the above and mark it for > > v5.14..v5.15? > > Sounds good. One of us will make sure that happens too. > FWIW, I took a look at v5.15.125 and I don't see the same bug there. It probably got fixed inadvertently with some other backporting. Looks like this is only a problem for older, non-stable-series kernels. The patch I created for RHEL8 is attached though, if you're interested. -- Jeff Layton <jlayton@xxxxxxxxxx>
From 9d97e2c37e98e893e38b3e64923ca68dc9c21e61 Mon Sep 17 00:00:00 2001 From: Jeffrey Layton <jlayton@xxxxxxxxxx> Date: Tue, 15 Aug 2023 09:42:21 -0400 Subject: [PATCH] cifs: fix bogus cifs_mount error handling in RHEL8 The patch that originally went in for this bug was bogus and could cause cifs_mount to return 0, even though it hadn't properly filled out the superblock. Ensure that we also return an error in this case. Paulo Alcantara suggested -ENOENT in the upstream discussion. Signed-off-by: Jeffrey Layton <jlayton@xxxxxxxxxx> --- fs/cifs/connect.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/fs/cifs/connect.c b/fs/cifs/connect.c index f882b38d4ed3..0af0c0718153 100644 --- a/fs/cifs/connect.c +++ b/fs/cifs/connect.c @@ -3507,8 +3507,10 @@ int cifs_mount(struct cifs_sb_info *cifs_sb, struct smb3_fs_context *ctx) rc = -ELOOP; } while (rc == -EREMOTE); - if (rc || !tcon) + if (rc || !tcon) { + rc = rc ? rc : -ENOENT; goto error; + } kfree(ref_path); /* -- 2.41.0