Re: NFS Force Unmounting

Trond Myklebust <trondmy@xxxxxxxxxxxxxxx> · Wed, 8 Nov 2017 23:52:54 +0000

On Thu, 2017-11-09 at 09:34 +1100, NeilBrown wrote:
> On Wed, Nov 08 2017, J. Bruce Fields wrote:
> 
> > On Wed, Nov 08, 2017 at 07:08:25AM -0500, Jeff Layton wrote:
> > > On Wed, 2017-11-08 at 14:30 +1100, NeilBrown wrote:
> > > > What to people think of the following as an approach
> > > > to Joshua's need?
> > > > 
> > > > It isn't complete by itself: it needs a couple of changes to
> > > > nfs-utils so that it doesn't stat the mountpoint on remount,
> > > > and it might need another kernel change so that the "mount"
> > > > system
> > > > call performs the same sort of careful lookup for remount
> > > > as  the umount
> > > > system call does, but those are relatively small details.
> > > > 
> > > 
> > > Yeah, that'd be good.
> > > 
> > > > This is the patch that you will either love of hate.
> > > > 
> > > > With this patch, Joshua (or any other sysadmin) could:
> > > > 
> > > >   mount -o remount,retrans=0,timeo=1 /path
> > > > 
> > > > and then new requests on any mountpoint from that server will
> > > > timeout
> > > > quickly.
> > > > Then
> > > >   umount -f /path
> > > >   umount -f /path
> > 
> > ...
> > > Looks like a reasonable approach overall to preventing new RPCs
> > > from
> > > being dispatched once the "force" umount runs.
> > 
> > I've lost track of the discussion--after this patch, how close are
> > we to
> > a guaranteed force unmount?  I assume there are still a few
> > obstacles.
> 
> This isn't really about forced unmount.
> The way forward to forced unmount it:
>  - make all waits on NFS be TASK_KILLABLE
>  - figure out what happens to dirty data when all processes have
>    been killed.
> 
> This is about allowing processes to be told that the filesystem is
> dead
> so that can respond (without blocking indefinitely) without
> necessarily being killed.
> With a local filesystem you can (in some cases) kill the underlying
> device and all processes will start getting EIO.  This is providing
> similar functionality for NFS.
> 
> > 
> > > I do wonder if this ought to be more automatic when you specify
> > > -f on
> > > the umount. Having to manually do a remount first doesn't seem
> > > very
> > > admin-friendly.
> > 
> > It's an odd interface.  Maybe we could wrap it in something more
> > intuitive.
> > 
> > I'd be nervous about making "umount -f" do it.  I think
> > administrators
> > could be unpleasantly surprised in some cases if an "umount -f"
> > affects
> > other mounts of the same server.
> 
> I was all set to tell you that it already does, but then tested and
> found it doesn't and ....
> 
> struct nfs_server (which sb->s_fs_info points to) contains
> 
> 	struct nfs_client *	nfs_client;	/* shared client
> and NFS4 state */
> 
> which is shared between different mounts from the same server, and
> 
> 	struct rpc_clnt *	client;		/* RPC client
> handle */
> 
> which isn't shared.
> struct nfs_client contains
> 	struct rpc_clnt *	cl_rpcclient;
> 
> which server->client is clones from.
> 
> The timeouts that apply to a mount are the ones from server->client,
> and so apply only to that mount (I thought they were shared, but that
> is
> a thought from years ago, and maybe it was wrong at the time).
> umount_begin aborts all rpcs associated with server->client.
> 
> So the 'remount,retrans=0,timeo=1' that I propose would only affect
> the
> one superblock (all bind-mounts of course, included sharecache
> mounts).
> 
> The comment in my code was wrong.
> 

I by far prefer an operation that changes the superblock state to be
done using 'mount -o remount'. The idea of a 'umount -f' that makes the
superblock irreversibly change state is clearly not desirable in an
environment where the same superblock could be bind mounted and/or
mounted in multiple private namespaces.

IOW: 'remount,retrans=0,timeo=1' would be slightly preferable to
hacking up "umount -f" because it is reversible.

That said, there is no reason why we couldn't implement a single mount
option that implements something akin to this "umount -f" behaviour
(and that can be reversed by resetting it through a second 'remount').
It seems to me that the requested behaviour is already pretty close to
what we already implement with the RPC_TASK_SOFTCONN option.

-- 
Trond Myklebust
Linux NFS client maintainer, PrimaryData
trond.myklebust@xxxxxxxxxxxxxxx
��.n��������+%������w��{.n�����{��w���jg��������ݢj����G�������j:+v���w�m������w�������h�����٥