Re: [PATCH v2 0/4] SMB cached directory fixes around reconnection/unmounting

Steve French <smfrench@xxxxxxxxx> · Thu, 21 Nov 2024 14:59:50 -0600

I noticed a hang today in generic/246 immediately after a crash in
umount in the previous test, generic/241 running with 6.12 kernel with
recent cifs-2.6.git/for-next which includes e.g. the 5 directory
caching fixes) generic/241 to Windows. See attached dmesg log.  Any
thoughts?
I did not see it hang on a previous run.

Nov 21 12:36:09 fedora29 kernel: ? umount_check+0xc3/0xf0
Nov 21 12:36:09 fedora29 kernel: ? __pfx_umount_check+0x10/0x10
Nov 21 12:36:09 fedora29 kernel: d_walk+0xf3/0x4e0
Nov 21 12:36:09 fedora29 kernel: ? d_walk+0x4b/0x4e0
Nov 21 12:36:09 fedora29 kernel: shrink_dcache_for_umount+0x6d/0x220
Nov 21 12:36:09 fedora29 kernel: generic_shutdown_super+0x4a/0x1c0
Nov 21 12:36:09 fedora29 kernel: kill_anon_super+0x22/0x40
Nov 21 12:36:09 fedora29 kernel: cifs_kill_sb+0x78/0x90 [cifs]

On Mon, Nov 18, 2024 at 3:53 PM Paul Aurich <paul@xxxxxxxxxxxxxx> wrote:
>
> v2:
> - Added locking in closed_all_cached_dirs()
> - Replaced use of the cifsiod_wq with a new workqueue used for dropping cached
>   dir dentries, and split out the "drop dentry" work from "potential
>   SMB2_close + cleanup" work so that close_all_cached_dirs() doesn't block on
>   server traffic, but can ensure all "drop dentry" work has run.
> - Repurposed the (essentially unused) cfid->fid_lock to protect cfid->dentry
>
>
> The SMB client cached directory functionality can either leak a cfid if
> open_cached_dir() races with a reconnect, or can have races between the
> unmount process and cached dir cleanup/lease breaks that all lead to
> a cached_dir instance not dropping its dentry ref in close_all_cached_dirs().
> These all manifest as a pair of BUGs when unmounting:
>
>     [18645.013550] BUG: Dentry ffff888140590ba0{i=1000000000080,n=/}  still in use (2) [unmount of cifs cifs]
>     [18645.789274] VFS: Busy inodes after unmount of cifs (cifs)
>
> These issues started with the lease directory cache handling introduced in
> commit ebe98f1447bb ("cifs: enable caching of directories for which a lease is
> held"), and go away if I mount with 'nohandlecache'.
>
> I'm able to reproduce the "Dentry still in use" errors by connecting to an
> actively-used SMB share (the server organically generates lease breaks) and
> leaving these running for 'a while':
>
> - while true; do cd ~; sleep 1; for i in {1..3}; do cd /mnt/test/subdir; echo $PWD; sleep 1; cd ..; echo $PWD; sleep 1; done; echo ...; done
> - while true; do iptables -F OUTPUT; mount -t cifs -a; for _ in {0..2}; do ls /mnt/test/subdir/ | wc -l; done; iptables -I OUTPUT -p tcp --dport 445 -j DROP; sleep 10; echo "unmounting"; umount -l -t cifs -a; echo "done unmounting"; sleep 20; echo "recovering"; iptables -F OUTPUT; sleep 10; done
>
> ('a while' is anywhere from 10 minutes to overnight. Also, it's not the
> cleanest reproducer, but I stopped iterating once I had something that was
> even remotely reliable for me...)
>
> This series attempts to fix these, as well as a use-after-free that could
> occur because open_cached_dir() explicitly frees the cached_fid, rather than
> relying on reference counting.
> Paul Aurich (4):
>   smb: cached directories can be more than root file handle
>   smb: Don't leak cfid when reconnect races with open_cached_dir
>   smb: prevent use-after-free due to open_cached_dir error paths
>   smb: During unmount, ensure all cached dir instances drop their dentry
>
>  fs/smb/client/cached_dir.c | 228 +++++++++++++++++++++++++------------
>  fs/smb/client/cached_dir.h |   6 +-
>  fs/smb/client/cifsfs.c     |  14 ++-
>  fs/smb/client/cifsglob.h   |   3 +-
>  fs/smb/client/inode.c      |   3 -
>  fs/smb/client/trace.h      |   3 +
>  6 files changed, 179 insertions(+), 78 deletions(-)
>
> --
> 2.45.2
>
>

-- 
Thanks,

Steve
Attachment:
messages

Description: Binary data