Re: Why do very few filesystems have umount helpers

Steve French <smfrench@xxxxxxxxx> · Sun, 28 Jul 2024 23:33:53 -0500



And here is a recent bugzilla related to umount issues (although this
could be distinct from the three other umount issues I mentioned
above)

https://bugzilla.kernel.org/show_bug.cgi?id=219097

On Sun, Jul 28, 2024 at 11:16 PM Steve French <smfrench@xxxxxxxxx> wrote:
>
> On Sun, Jul 28, 2024 at 7:01 PM Al Viro <viro@xxxxxxxxxxxxxxxxxx> wrote:
> >
> > On Sun, Jul 28, 2024 at 02:09:14PM -0500, Steve French wrote:
> >
> > > Since umount does not notify the filesystem on unmount until
> > > references are closed (unless you do "umount --force") and therefore
> > > the filesystem is only notified at kill_sb time, an easier approach to
> > > fixing some of the problems where resources are kept around too long
> > > (e.g. cached handles or directory entries etc. or references on the
> > > mount are held) may be to add a mount helper which notifies the fs
> > > (e.g. via fs specific ioctl) when umount has begun.   That may be an
> > > easier solution that adding a VFS call to notify the fs when umount
> > > begins.
> >
> > "references on the mount being held" is not something any userland
> > helpers have a chance to help with.
>
> I don't know the exact reasons why at least three filesystems have umount
> helpers but presumably they can issue private ioctls (or equivalent)
> to release resources, but I am very curious if their reasons would
> overlap any common SMB3.1.1 network mount use cases.
>
> > What exactly gets leaked in your tests?  And what would that userland
> > helper do when umount happens due to the last process in given namespace
> > getting killed, for example?  Any unexpected "busy" at umount(2) time
> > would translate into filesystem instances stuck around (already detached
> > from any mount trees) for unspecified time; not a good thing, obviously,
> > and not something a userland helper had a chance to help with...
> >
> > Details, please.
>
> There are three things in particular that got me thinking about how
> other filesystems handle umount (and whether the umount helper
> concept is a bad idea or a good idea for network fs)
>
> 1) Better resource usage: network filesystems often have cached
> information due to
> leases (or 'delegations' in NFS terminology) on files or directory
> entries.  Waiting until kill_superblock (rather than when umount began) can
> waste resources.  This cached information is not automatically released
> when the file or directory is closed (note that "deferred close" of files can
> be a huge performance win for network filesystems which support safe
> caching via leases/delegations) ... but these caches consume
> resources that ideally would be freed when umount starts, but often have to
> wait longer until kill_sb is invoked to be freed.  If "umount_begin"
> were called always e.g. then (assuming this were not multiple mounts
> from the same client that server share) cifs.ko could
>   a) close all deferred network file handles (freeing up some resources)
>   b) stop waiting for any pending network i/o requests
>   c) mark the tree connection (connection to the server share) as "EXITING"
>    so we don't have races sending new i/o operations on that share
>
> 2) fixing races between umount and mount:
> There are some common test scenarios where we can run a series of
> xfstests that will eventually fail (e.g. by the time xfstest runs gets
> to 043 or 044
> (to Samba server on localhost e.g.) they sometimes hit races which
> cause this message:
>
>      QA output created by 043
>     +umount: /mnt-local-xfstest/scratch: target is busy.
>     +mount error(16): Device or resource busy
>
> but it works fine if delay is inserted between these tests.  I will
> try some experiments to
> see if changing xfstests to call force unmount which calls "umount_begin" (or
> adding a umount wrapper to do the same) also avoids the problem. It
> could be that
> references may be being held by cifs.ko briefly that are causing the VFS to
> think that files are open and not calling into cifs.ko to
> kill_superblock. This needs
> more investigation but "umount --force" (or equivalent) may help.
>
> 3) races in cleaning up directory cache information.  There was a
> patch introduced for
> periodically cleaning up the directory cache (this is only an issue to
> servers like
> Windows or NetApp etc. that support directory leases so you don't see
> it to Samba
> and various other common servers that don't enable directory leases)
> that can cause
> crashes in unmount (use after free).  I want to try to narrow it down
> soon, but it was a little
> tricky (and the assumption was that force unmount would avoid the
> problem - ie the call to
> "umount_begin").  Looks like this patch causes the intermittent umount
> crash to Windows
> servers:
>
> commit d14de8067e3f9653cdef5a094176d00f3260ab20
> Author: Ronnie Sahlberg <lsahlber@xxxxxxxxxx>
> Date:   Thu Jul 6 12:32:24 2023 +1000
>
>     cifs: Add a laundromat thread for cached directories
>
>     and drop cached directories after 30 seconds
>
>
>
> Steve


-- 
Thanks,

Steve