On Fri, 2023-08-11 at 13:49 -0300, Paulo Alcantara wrote: > Jeff Layton <jlayton@xxxxxxxxxx> writes: > > > Thanks for the confirmation. There were some oopses on some RHEL8 (5.14 > > based kernels). The stack looked something like this: > > > > PID: 2415716 TASK: ffff937139090000 CPU: 3 COMMAND: "ls" > > #0 [ffff9ef946b23728] machine_kexec at ffffffffac867cfe > > #1 [ffff9ef946b23780] __crash_kexec at ffffffffac9ad94d > > #2 [ffff9ef946b23848] crash_kexec at ffffffffac9ae881 > > #3 [ffff9ef946b23860] oops_end at ffffffffac8274f1 > > #4 [ffff9ef946b23880] no_context at ffffffffac879a03 > > #5 [ffff9ef946b238d8] __bad_area_nosemaphore at ffffffffac879d64 > > #6 [ffff9ef946b23920] do_page_fault at ffffffffac87a617 > > #7 [ffff9ef946b23950] page_fault at ffffffffad20111e > > [exception RIP: cifs_mount+1126] > > RIP: ffffffffc08e8826 RSP: ffff9ef946b23a00 RFLAGS: 00010246 > > RAX: 0000000000000000 RBX: ffff936c8221ea00 RCX: ffff936f8018b320 > > RDX: 0000000000000001 RSI: ffff936f8018b420 RDI: ffff936c8221ea00 > > RBP: ffff9ef946b23a90 R8: 5346445756535c5c R9: 6765642e50313031 > > R10: 65622e666f6f7267 R11: 0063696c6275505c R12: ffff9370b192fc00 > > R13: ffff936f8018b420 R14: 00000000003097ad R15: 0000000000000000 > > ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 > > #8 [ffff9ef946b23a98] cifs_smb3_do_mount at ffffffffc08d11f2 [cifs] > > #9 [ffff9ef946b23aa0] cifs_smb3_do_mount at ffffffffc08d11f2 [cifs] > > #10 [ffff9ef946b23b08] smb3_get_tree at ffffffffc0930ae0 [cifs] > > #11 [ffff9ef946b23b30] vfs_get_tree at ffffffffacb52365 > > #12 [ffff9ef946b23b50] fc_mount at ffffffffacb7485e > > #13 [ffff9ef946b23b60] vfs_kern_mount at ffffffffacb748ec > > #14 [ffff9ef946b23b80] cifs_dfs_do_automount at ffffffffc093552e [cifs] > > #15 [ffff9ef946b23bc0] cifs_dfs_d_automount at ffffffffc0935880 [cifs] > > #16 [ffff9ef946b23bd0] follow_managed at ffffffffacb5bdaf > > #17 [ffff9ef946b23c10] lookup_fast at ffffffffacb5c7e5 > > #18 [ffff9ef946b23c68] walk_component at ffffffffacb5d258 > > #19 [ffff9ef946b23cc8] path_lookupat at ffffffffacb5e215 > > #20 [ffff9ef946b23d28] filename_lookup at ffffffffacb62710 > > #21 [ffff9ef946b23e40] vfs_statx at ffffffffacb55874 > > #22 [ffff9ef946b23e98] __do_sys_statx at ffffffffacb5692b > > #23 [ffff9ef946b23f38] do_syscall_64 at ffffffffac8043ab > > #24 [ffff9ef946b23f50] entry_SYSCALL_64_after_hwframe at > > ffffffffad2000a9 > > RIP: 00007ff6a2637edf RSP: 00007ffe040017d0 RFLAGS: 00000246 > > RAX: ffffffffffffffda RBX: 00007ffe04001910 RCX: 00007ff6a2637edf > > RDX: 0000000000000100 RSI: 00007ffe04001910 RDI: 00000000ffffff9c > > RBP: 0000000000000100 R8: 00007ffe040017f0 R9: 00000000ffffff9c > > R10: 0000000000000002 R11: 0000000000000246 R12: 00007ffe040017f0 > > R13: 0000000000000000 R14: 0000000000000003 R15: 000055ce1a3ae1b8 > > ORIG_RAX: 000000000000014c CS: 0033 SS: 002b > > > > Analysis of the vmcore by Roberto showed that we had ended up past that > > point with tcon==NULL and rc==0. > > Interesting. Thanks for sharing the backtrace! > > So it actually ended up with NULL @tcon and rc == 0 while mounting a DFS > link. Nice catch! > > > Steve's patch would have fixed the panic there, but I think the host > > would have ended up with a successful mount, but with a broken > > superblock. The current code seems a bit less fragile, and I didn't see > > any similar brokenness there, but I didn't look too hard either. > > Yeah, makes sense. > > > In any case, we'll plan to fix this up with a one-off in RHEL/Centos. > > Thanks again for the sanity check! > > Would you mind to propose a patch that fixes the above and mark it for > v5.14..v5.15? Sounds good. One of us will make sure that happens too. Thanks! -- Jeff Layton <jlayton@xxxxxxxxxx>