And here is a recent bugzilla related to umount issues (although this could be distinct from the three other umount issues I mentioned above) https://bugzilla.kernel.org/show_bug.cgi?id=219097 On Sun, Jul 28, 2024 at 11:16 PM Steve French <smfrench@xxxxxxxxx> wrote: > > On Sun, Jul 28, 2024 at 7:01 PM Al Viro <viro@xxxxxxxxxxxxxxxxxx> wrote: > > > > On Sun, Jul 28, 2024 at 02:09:14PM -0500, Steve French wrote: > > > > > Since umount does not notify the filesystem on unmount until > > > references are closed (unless you do "umount --force") and therefore > > > the filesystem is only notified at kill_sb time, an easier approach to > > > fixing some of the problems where resources are kept around too long > > > (e.g. cached handles or directory entries etc. or references on the > > > mount are held) may be to add a mount helper which notifies the fs > > > (e.g. via fs specific ioctl) when umount has begun. That may be an > > > easier solution that adding a VFS call to notify the fs when umount > > > begins. > > > > "references on the mount being held" is not something any userland > > helpers have a chance to help with. > > I don't know the exact reasons why at least three filesystems have umount > helpers but presumably they can issue private ioctls (or equivalent) > to release resources, but I am very curious if their reasons would > overlap any common SMB3.1.1 network mount use cases. > > > What exactly gets leaked in your tests? And what would that userland > > helper do when umount happens due to the last process in given namespace > > getting killed, for example? Any unexpected "busy" at umount(2) time > > would translate into filesystem instances stuck around (already detached > > from any mount trees) for unspecified time; not a good thing, obviously, > > and not something a userland helper had a chance to help with... > > > > Details, please. > > There are three things in particular that got me thinking about how > other filesystems handle umount (and whether the umount helper > concept is a bad idea or a good idea for network fs) > > 1) Better resource usage: network filesystems often have cached > information due to > leases (or 'delegations' in NFS terminology) on files or directory > entries. Waiting until kill_superblock (rather than when umount began) can > waste resources. This cached information is not automatically released > when the file or directory is closed (note that "deferred close" of files can > be a huge performance win for network filesystems which support safe > caching via leases/delegations) ... but these caches consume > resources that ideally would be freed when umount starts, but often have to > wait longer until kill_sb is invoked to be freed. If "umount_begin" > were called always e.g. then (assuming this were not multiple mounts > from the same client that server share) cifs.ko could > a) close all deferred network file handles (freeing up some resources) > b) stop waiting for any pending network i/o requests > c) mark the tree connection (connection to the server share) as "EXITING" > so we don't have races sending new i/o operations on that share > > 2) fixing races between umount and mount: > There are some common test scenarios where we can run a series of > xfstests that will eventually fail (e.g. by the time xfstest runs gets > to 043 or 044 > (to Samba server on localhost e.g.) they sometimes hit races which > cause this message: > > QA output created by 043 > +umount: /mnt-local-xfstest/scratch: target is busy. > +mount error(16): Device or resource busy > > but it works fine if delay is inserted between these tests. I will > try some experiments to > see if changing xfstests to call force unmount which calls "umount_begin" (or > adding a umount wrapper to do the same) also avoids the problem. It > could be that > references may be being held by cifs.ko briefly that are causing the VFS to > think that files are open and not calling into cifs.ko to > kill_superblock. This needs > more investigation but "umount --force" (or equivalent) may help. > > 3) races in cleaning up directory cache information. There was a > patch introduced for > periodically cleaning up the directory cache (this is only an issue to > servers like > Windows or NetApp etc. that support directory leases so you don't see > it to Samba > and various other common servers that don't enable directory leases) > that can cause > crashes in unmount (use after free). I want to try to narrow it down > soon, but it was a little > tricky (and the assumption was that force unmount would avoid the > problem - ie the call to > "umount_begin"). Looks like this patch causes the intermittent umount > crash to Windows > servers: > > commit d14de8067e3f9653cdef5a094176d00f3260ab20 > Author: Ronnie Sahlberg <lsahlber@xxxxxxxxxx> > Date: Thu Jul 6 12:32:24 2023 +1000 > > cifs: Add a laundromat thread for cached directories > > and drop cached directories after 30 seconds > > > > Steve -- Thanks, Steve