Re: multiuser mount option regression

Satadru Pramanik <satadru@xxxxxxxxx> · Thu, 17 Mar 2022 02:23:19 -0400

I am testing Ronnie's patch from earlier today now, and will see if
the mount has broken later this morning, as I'm putting the machine to
sleep now.

I tried just reverting 73f9bfbe3d818bb52266d5c9f3ba57d97842ffe7 in
5.17-rc8, but it broke cifs mounting entirely with the error I
mentioned earlier today:

[  242.560881] INFO: task mount.smb3:3219 blocked for more than 120 seconds.
[  242.560901]       Tainted: P           OE
5.17.0-051700rc8-generic #202203132130
[  242.560904] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[  242.560907] task:mount.smb3      state:D stack:    0 pid: 3219
ppid:     1 flags:0x00004006
[  242.560914] Call Trace:
[  242.560918]  <TASK>
[  242.560927]  __schedule+0x240/0x5a0
[  242.560939]  schedule+0x55/0xd0
[  242.560941]  schedule_preempt_disabled+0x15/0x20
[  242.560944]  __mutex_lock.constprop.0+0x2e0/0x4b0
[  242.560949]  __mutex_lock_slowpath+0x13/0x20
[  242.560953]  mutex_lock+0x34/0x40
[  242.560958]  cifs_get_smb_ses+0x367/0xab0 [cifs]
[  242.561108]  ? __queue_delayed_work+0x5c/0x90
[  242.561120]  mount_get_conns+0x63/0x430 [cifs]
[  242.561182]  cifs_mount+0x86/0x420 [cifs]
[  242.561222]  cifs_smb3_do_mount+0x10d/0x320 [cifs]
[  242.561252]  ? cifs_smb3_do_mount+0x10d/0x320 [cifs]
[  242.561283]  ? vfs_parse_fs_string+0x7f/0xb0
[  242.561290]  smb3_get_tree+0x3e/0x70 [cifs]
[  242.561337]  vfs_get_tree+0x27/0xc0
[  242.561343]  do_new_mount+0x14b/0x1a0
[  242.561348]  path_mount+0x1d4/0x530
[  242.561350]  ? putname+0x55/0x60
[  242.561357]  __x64_sys_mount+0x108/0x140
[  242.561360]  do_syscall_64+0x59/0xc0
[  242.561368]  ? do_syscall_64+0x69/0xc0
[  242.561372]  ? handle_mm_fault+0xba/0x290
[  242.561376]  ? do_user_addr_fault+0x1dd/0x670
[  242.561382]  ? syscall_exit_to_user_mode+0x27/0x50
[  242.561385]  ? exit_to_user_mode_prepare+0x37/0xb0
[  242.561392]  ? irqentry_exit_to_user_mode+0x9/0x20
[  242.561394]  ? irqentry_exit+0x33/0x40
[  242.561397]  ? exc_page_fault+0x89/0x180
[  242.561399]  ? asm_exc_page_fault+0x8/0x30
[  242.561405]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[  242.561409] RIP: 0033:0x7f42af11ceae
[  242.561414] RSP: 002b:00007fff6af66c48 EFLAGS: 00000206 ORIG_RAX:
00000000000000a5
[  242.561418] RAX: ffffffffffffffda RBX: 000055dcbe40beb0 RCX: 00007f42af11ceae
[  242.561420] RDX: 000055dcbe1a447e RSI: 000055dcbe1a44da RDI: 00007fff6af67ea6
[  242.561421] RBP: 0000000000000000 R08: 000055dcbe40beb0 R09: 000055dcbe40cf40
[  242.561423] R10: 0000000000000000 R11: 0000000000000206 R12: 00007fff6af67e9b
[  242.561424] R13: 00007f42af237000 R14: 00007f42af23990f R15: 000055dcbe40cf40
[  242.561427]  </TASK>

On Thu, Mar 17, 2022 at 2:03 AM Steve French <smfrench@xxxxxxxxx> wrote:
>
> I narrowed the regression for multiuser mounts down (which Ronnie had
> mentioned) to this patch (one of the first applied to 5.17 merge
> window for cifs.ko).   I am curious whether this is also related to
> the hard to reproduce reconnect issue that the regression tracker was
> monitoring
>
> commit 73f9bfbe3d818bb52266d5c9f3ba57d97842ffe7 (HEAD -> tmp)
> Author: Shyam Prasad N <sprasad@xxxxxxxxxxxxx>
> Date:   Mon Jul 19 17:37:52 2021 +0000
>
>     cifs: maintain a state machine for tcp/smb/tcon sessions
>
>     If functions like cifs_negotiate_protocol, cifs_setup_session,
>     cifs_tree_connect are called in parallel on different channels,
>     each of these will be execute the requests. This maybe unnecessary
>     in some cases, and only the first caller may need to do the work.
>
>     This is achieved by having more states for the tcp/smb/tcon session
>     status fields. And tracking the state of reconnection based on the
>     state machine.
>
>     For example:
>     for tcp connections:
>     CifsNew/CifsNeedReconnect ->
>       CifsNeedNegotiate ->
>         CifsInNegotiate ->
>           CifsNeedSessSetup ->
>             CifsInSessSetup ->
>               CifsGood
>
>     for smb sessions:
>     CifsNew/CifsNeedReconnect ->
>       CifsGood
>
>      CifsNew/CifsNeedReconnect ->
>       CifsInFilesInvalidate ->
>         CifsNeedTcon ->
>           CifsInTcon ->
>             CifsGood
>
>     If any channel reconnect sees that it's in the middle of
>     transition to CifsGood, then they can skip the function.
>
>
>
> --
> Thanks,
>
> Steve