Re: NFS Force Unmounting

NeilBrown <neilb@xxxxxxxx> · Thu, 09 Nov 2017 09:34:59 +1100

On Wed, Nov 08 2017, J. Bruce Fields wrote:

> On Wed, Nov 08, 2017 at 07:08:25AM -0500, Jeff Layton wrote:
>> On Wed, 2017-11-08 at 14:30 +1100, NeilBrown wrote:
>> > What to people think of the following as an approach
>> > to Joshua's need?
>> > 
>> > It isn't complete by itself: it needs a couple of changes to
>> > nfs-utils so that it doesn't stat the mountpoint on remount,
>> > and it might need another kernel change so that the "mount" system
>> > call performs the same sort of careful lookup for remount as  the umount
>> > system call does, but those are relatively small details.
>> > 
>> 
>> Yeah, that'd be good.
>> 
>> > This is the patch that you will either love of hate.
>> > 
>> > With this patch, Joshua (or any other sysadmin) could:
>> > 
>> >   mount -o remount,retrans=0,timeo=1 /path
>> > 
>> > and then new requests on any mountpoint from that server will timeout
>> > quickly.
>> > Then
>> >   umount -f /path
>> >   umount -f /path
> ...
>> Looks like a reasonable approach overall to preventing new RPCs from
>> being dispatched once the "force" umount runs.
>
> I've lost track of the discussion--after this patch, how close are we to
> a guaranteed force unmount?  I assume there are still a few obstacles.

This isn't really about forced unmount.
The way forward to forced unmount it:
 - make all waits on NFS be TASK_KILLABLE
 - figure out what happens to dirty data when all processes have
   been killed.

This is about allowing processes to be told that the filesystem is dead
so that can respond (without blocking indefinitely) without
necessarily being killed.
With a local filesystem you can (in some cases) kill the underlying
device and all processes will start getting EIO.  This is providing
similar functionality for NFS.

>
>> I do wonder if this ought to be more automatic when you specify -f on
>> the umount. Having to manually do a remount first doesn't seem very
>> admin-friendly.
>
> It's an odd interface.  Maybe we could wrap it in something more
> intuitive.
>
> I'd be nervous about making "umount -f" do it.  I think administrators
> could be unpleasantly surprised in some cases if an "umount -f" affects
> other mounts of the same server.

I was all set to tell you that it already does, but then tested and
found it doesn't and ....

struct nfs_server (which sb->s_fs_info points to) contains

	struct nfs_client *	nfs_client;	/* shared client and NFS4 state */

which is shared between different mounts from the same server, and

	struct rpc_clnt *	client;		/* RPC client handle */

which isn't shared.
struct nfs_client contains
	struct rpc_clnt *	cl_rpcclient;

which server->client is clones from.

The timeouts that apply to a mount are the ones from server->client,
and so apply only to that mount (I thought they were shared, but that is
a thought from years ago, and maybe it was wrong at the time).
umount_begin aborts all rpcs associated with server->client.

So the 'remount,retrans=0,timeo=1' that I propose would only affect the
one superblock (all bind-mounts of course, included sharecache mounts).

The comment in my code was wrong.

Thanks,
NeilBrown
Attachment:
signature.asc

Description: PGP signature