I noticed a hang today in generic/246 immediately after a crash in umount in the previous test, generic/241 running with 6.12 kernel with recent cifs-2.6.git/for-next which includes e.g. the 5 directory caching fixes) generic/241 to Windows. See attached dmesg log. Any thoughts? I did not see it hang on a previous run. Nov 21 12:36:09 fedora29 kernel: ? umount_check+0xc3/0xf0 Nov 21 12:36:09 fedora29 kernel: ? __pfx_umount_check+0x10/0x10 Nov 21 12:36:09 fedora29 kernel: d_walk+0xf3/0x4e0 Nov 21 12:36:09 fedora29 kernel: ? d_walk+0x4b/0x4e0 Nov 21 12:36:09 fedora29 kernel: shrink_dcache_for_umount+0x6d/0x220 Nov 21 12:36:09 fedora29 kernel: generic_shutdown_super+0x4a/0x1c0 Nov 21 12:36:09 fedora29 kernel: kill_anon_super+0x22/0x40 Nov 21 12:36:09 fedora29 kernel: cifs_kill_sb+0x78/0x90 [cifs] On Mon, Nov 18, 2024 at 3:53 PM Paul Aurich <paul@xxxxxxxxxxxxxx> wrote: > > v2: > - Added locking in closed_all_cached_dirs() > - Replaced use of the cifsiod_wq with a new workqueue used for dropping cached > dir dentries, and split out the "drop dentry" work from "potential > SMB2_close + cleanup" work so that close_all_cached_dirs() doesn't block on > server traffic, but can ensure all "drop dentry" work has run. > - Repurposed the (essentially unused) cfid->fid_lock to protect cfid->dentry > > > The SMB client cached directory functionality can either leak a cfid if > open_cached_dir() races with a reconnect, or can have races between the > unmount process and cached dir cleanup/lease breaks that all lead to > a cached_dir instance not dropping its dentry ref in close_all_cached_dirs(). > These all manifest as a pair of BUGs when unmounting: > > [18645.013550] BUG: Dentry ffff888140590ba0{i=1000000000080,n=/} still in use (2) [unmount of cifs cifs] > [18645.789274] VFS: Busy inodes after unmount of cifs (cifs) > > These issues started with the lease directory cache handling introduced in > commit ebe98f1447bb ("cifs: enable caching of directories for which a lease is > held"), and go away if I mount with 'nohandlecache'. > > I'm able to reproduce the "Dentry still in use" errors by connecting to an > actively-used SMB share (the server organically generates lease breaks) and > leaving these running for 'a while': > > - while true; do cd ~; sleep 1; for i in {1..3}; do cd /mnt/test/subdir; echo $PWD; sleep 1; cd ..; echo $PWD; sleep 1; done; echo ...; done > - while true; do iptables -F OUTPUT; mount -t cifs -a; for _ in {0..2}; do ls /mnt/test/subdir/ | wc -l; done; iptables -I OUTPUT -p tcp --dport 445 -j DROP; sleep 10; echo "unmounting"; umount -l -t cifs -a; echo "done unmounting"; sleep 20; echo "recovering"; iptables -F OUTPUT; sleep 10; done > > ('a while' is anywhere from 10 minutes to overnight. Also, it's not the > cleanest reproducer, but I stopped iterating once I had something that was > even remotely reliable for me...) > > This series attempts to fix these, as well as a use-after-free that could > occur because open_cached_dir() explicitly frees the cached_fid, rather than > relying on reference counting. > Paul Aurich (4): > smb: cached directories can be more than root file handle > smb: Don't leak cfid when reconnect races with open_cached_dir > smb: prevent use-after-free due to open_cached_dir error paths > smb: During unmount, ensure all cached dir instances drop their dentry > > fs/smb/client/cached_dir.c | 228 +++++++++++++++++++++++++------------ > fs/smb/client/cached_dir.h | 6 +- > fs/smb/client/cifsfs.c | 14 ++- > fs/smb/client/cifsglob.h | 3 +- > fs/smb/client/inode.c | 3 - > fs/smb/client/trace.h | 3 + > 6 files changed, 179 insertions(+), 78 deletions(-) > > -- > 2.45.2 > > -- Thanks, Steve
Attachment:
messages
Description: Binary data