I am testing Ronnie's patch from earlier today now, and will see if the mount has broken later this morning, as I'm putting the machine to sleep now. I tried just reverting 73f9bfbe3d818bb52266d5c9f3ba57d97842ffe7 in 5.17-rc8, but it broke cifs mounting entirely with the error I mentioned earlier today: [ 242.560881] INFO: task mount.smb3:3219 blocked for more than 120 seconds. [ 242.560901] Tainted: P OE 5.17.0-051700rc8-generic #202203132130 [ 242.560904] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 242.560907] task:mount.smb3 state:D stack: 0 pid: 3219 ppid: 1 flags:0x00004006 [ 242.560914] Call Trace: [ 242.560918] <TASK> [ 242.560927] __schedule+0x240/0x5a0 [ 242.560939] schedule+0x55/0xd0 [ 242.560941] schedule_preempt_disabled+0x15/0x20 [ 242.560944] __mutex_lock.constprop.0+0x2e0/0x4b0 [ 242.560949] __mutex_lock_slowpath+0x13/0x20 [ 242.560953] mutex_lock+0x34/0x40 [ 242.560958] cifs_get_smb_ses+0x367/0xab0 [cifs] [ 242.561108] ? __queue_delayed_work+0x5c/0x90 [ 242.561120] mount_get_conns+0x63/0x430 [cifs] [ 242.561182] cifs_mount+0x86/0x420 [cifs] [ 242.561222] cifs_smb3_do_mount+0x10d/0x320 [cifs] [ 242.561252] ? cifs_smb3_do_mount+0x10d/0x320 [cifs] [ 242.561283] ? vfs_parse_fs_string+0x7f/0xb0 [ 242.561290] smb3_get_tree+0x3e/0x70 [cifs] [ 242.561337] vfs_get_tree+0x27/0xc0 [ 242.561343] do_new_mount+0x14b/0x1a0 [ 242.561348] path_mount+0x1d4/0x530 [ 242.561350] ? putname+0x55/0x60 [ 242.561357] __x64_sys_mount+0x108/0x140 [ 242.561360] do_syscall_64+0x59/0xc0 [ 242.561368] ? do_syscall_64+0x69/0xc0 [ 242.561372] ? handle_mm_fault+0xba/0x290 [ 242.561376] ? do_user_addr_fault+0x1dd/0x670 [ 242.561382] ? syscall_exit_to_user_mode+0x27/0x50 [ 242.561385] ? exit_to_user_mode_prepare+0x37/0xb0 [ 242.561392] ? irqentry_exit_to_user_mode+0x9/0x20 [ 242.561394] ? irqentry_exit+0x33/0x40 [ 242.561397] ? exc_page_fault+0x89/0x180 [ 242.561399] ? asm_exc_page_fault+0x8/0x30 [ 242.561405] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 242.561409] RIP: 0033:0x7f42af11ceae [ 242.561414] RSP: 002b:00007fff6af66c48 EFLAGS: 00000206 ORIG_RAX: 00000000000000a5 [ 242.561418] RAX: ffffffffffffffda RBX: 000055dcbe40beb0 RCX: 00007f42af11ceae [ 242.561420] RDX: 000055dcbe1a447e RSI: 000055dcbe1a44da RDI: 00007fff6af67ea6 [ 242.561421] RBP: 0000000000000000 R08: 000055dcbe40beb0 R09: 000055dcbe40cf40 [ 242.561423] R10: 0000000000000000 R11: 0000000000000206 R12: 00007fff6af67e9b [ 242.561424] R13: 00007f42af237000 R14: 00007f42af23990f R15: 000055dcbe40cf40 [ 242.561427] </TASK> On Thu, Mar 17, 2022 at 2:03 AM Steve French <smfrench@xxxxxxxxx> wrote: > > I narrowed the regression for multiuser mounts down (which Ronnie had > mentioned) to this patch (one of the first applied to 5.17 merge > window for cifs.ko). I am curious whether this is also related to > the hard to reproduce reconnect issue that the regression tracker was > monitoring > > commit 73f9bfbe3d818bb52266d5c9f3ba57d97842ffe7 (HEAD -> tmp) > Author: Shyam Prasad N <sprasad@xxxxxxxxxxxxx> > Date: Mon Jul 19 17:37:52 2021 +0000 > > cifs: maintain a state machine for tcp/smb/tcon sessions > > If functions like cifs_negotiate_protocol, cifs_setup_session, > cifs_tree_connect are called in parallel on different channels, > each of these will be execute the requests. This maybe unnecessary > in some cases, and only the first caller may need to do the work. > > This is achieved by having more states for the tcp/smb/tcon session > status fields. And tracking the state of reconnection based on the > state machine. > > For example: > for tcp connections: > CifsNew/CifsNeedReconnect -> > CifsNeedNegotiate -> > CifsInNegotiate -> > CifsNeedSessSetup -> > CifsInSessSetup -> > CifsGood > > for smb sessions: > CifsNew/CifsNeedReconnect -> > CifsGood > > CifsNew/CifsNeedReconnect -> > CifsInFilesInvalidate -> > CifsNeedTcon -> > CifsInTcon -> > CifsGood > > If any channel reconnect sees that it's in the middle of > transition to CifsGood, then they can skip the function. > > > > -- > Thanks, > > Steve