The perf improvement allowing caching of directories (helping "ls" then "ls" again for same dir) not just the perf improvement with "ls "then "stat" could be very important. Did this series also address Ralph's point about missing requesting "H" (handle caching) flag when requesting directory leases from the server? On Fri, Nov 8, 2024 at 4:35 PM Paul Aurich <paul@xxxxxxxxxxxxxx> wrote: > > The SMB client cached directory functionality has a few problems around > flaky/lost server connections, which manifest as a pair of BUGs when > eventually unmounting the server connection: > > [18645.013550] BUG: Dentry ffff888140590ba0{i=1000000000080,n=/} still in use (2) [unmount of cifs cifs] > [18645.789274] VFS: Busy inodes after unmount of cifs (cifs) > > Based on bisection, these issues started with the lease directory cache > handling introduced in commit ebe98f1447bb ("cifs: enable caching of > directories for which a lease is held"), and go away if I mount with > 'nohandlecache'. I started seeing these on Debian Bookworm stable kernel > (v6.1.x), but the issues persist even in current git versions. I think the > situation was improved (occurrence frequency went down) with > commit 5c86919455c1 ("smb: client: fix use-after-free in > smb2_query_info_compound()"). > > > I'm able to reproduce the "Dentry still in use" errors by connecting to an > actively-used SMB share (the server organically generates lease breaks) and > leaving these running for 'a while': > > - while true; do cd ~; sleep 1; for i in {1..3}; do cd /mnt/test/subdir; echo $PWD; sleep 1; cd ..; echo $PWD; sleep 1; done; echo ...; done > - while true; do iptables -F OUTPUT; mount -t cifs -a; for _ in {0..2}; do ls /mnt/test/subdir/ | wc -l; done; iptables -I OUTPUT -p tcp --dport 445 -j DROP; sleep 10; echo "unmounting"; umount -l -t cifs -a; echo "done unmounting"; sleep 20; echo "recovering"; iptables -F OUTPUT; sleep 10; done > > ('a while' is anywhere from 10 minutes to overnight. Also, it's not the > cleanest reproducer, but I stopped iterating once I had something that was > even remotely reliable for me...) > > This series attempts to fix these, as well as a use-after-free that could > occur because open_cached_dir() explicitly frees the cached_fid, rather than > relying on reference counting. > > > The last patch in this series (smb: During umount, flush any pending lease > breaks for cached dirs) is a work-in-progress, and should not be taken as-is. > The issue there: > > Unmounting an SMB share (cifs_kill_sb) can race with a queued lease break from > the server for a cached directory. When this happens, the cfid is removed > from the linked list, so close_all_cached_dirs() cannot drop the dentry. If > cifs_kill_sb continues on before the queued work puts the dentry, we trigger > the "Dentry still in use" BUG splat. Flushing the cifsiod_wq seems to help > this, but some thoughts: > > 1. cifsiod_wq is a global workqueue, rather than per-mount. Flushing the > entire workqueue is potentially doing more work that necessary. Should > there be a workqueue that's more appropriately scoped? > 2. With an unresponsive server, this causes umount (even umount -l) to hang > (waiting for SMB2_close calls), and when I test with backports on a 6.1 > kernel, appears to cause a deadlock between kill_sb and some cifs > reconnection code calling iterate_supers_type. (Pretty sure the deadlock > was addressed by changes to fs/super.c, so not really an SMB problem, but > just an indication that flush_waitqueue isn't the right solution). > 3. Should cached_dir_lease_break() drop the dentry before queueing work > (and if so, is it OK to do this under the spinlock, or should the spinlock > be dropped first)? > 4. Related to #3 -- shouldn't close_all_cached_dirs() be holding the spinlock > while looping? > > > Lastly, patches 2, 3, and 5 (in its final form) are beneficial going back to > v6.1 for stable, but it's not a clean backport because some other important > fixes (commit 5c86919455c1 ("smb: client: fix use-after-free in > smb2_query_info_compound()") weren't picked up. > > Paul Aurich (5): > smb: cached directories can be more than root file handle > smb: Don't leak cfid when reconnect races with open_cached_dir > smb: prevent use-after-free due to open_cached_dir error paths > smb: No need to wait for work when cleaning up cached directories > smb: During umount, flush any pending lease breaks for cached dirs > > fs/smb/client/cached_dir.c | 106 ++++++++++++++++--------------------- > 1 file changed, 47 insertions(+), 59 deletions(-) > > -- > 2.45.2 > > -- Thanks, Steve