On Thu, 2017-11-09 at 09:34 +1100, NeilBrown wrote: > On Wed, Nov 08 2017, J. Bruce Fields wrote: > > > On Wed, Nov 08, 2017 at 07:08:25AM -0500, Jeff Layton wrote: > > > On Wed, 2017-11-08 at 14:30 +1100, NeilBrown wrote: > > > > What to people think of the following as an approach > > > > to Joshua's need? > > > > > > > > It isn't complete by itself: it needs a couple of changes to > > > > nfs-utils so that it doesn't stat the mountpoint on remount, > > > > and it might need another kernel change so that the "mount" > > > > system > > > > call performs the same sort of careful lookup for remount > > > > as the umount > > > > system call does, but those are relatively small details. > > > > > > > > > > Yeah, that'd be good. > > > > > > > This is the patch that you will either love of hate. > > > > > > > > With this patch, Joshua (or any other sysadmin) could: > > > > > > > > mount -o remount,retrans=0,timeo=1 /path > > > > > > > > and then new requests on any mountpoint from that server will > > > > timeout > > > > quickly. > > > > Then > > > > umount -f /path > > > > umount -f /path > > > > ... > > > Looks like a reasonable approach overall to preventing new RPCs > > > from > > > being dispatched once the "force" umount runs. > > > > I've lost track of the discussion--after this patch, how close are > > we to > > a guaranteed force unmount? I assume there are still a few > > obstacles. > > This isn't really about forced unmount. > The way forward to forced unmount it: > - make all waits on NFS be TASK_KILLABLE > - figure out what happens to dirty data when all processes have > been killed. > > This is about allowing processes to be told that the filesystem is > dead > so that can respond (without blocking indefinitely) without > necessarily being killed. > With a local filesystem you can (in some cases) kill the underlying > device and all processes will start getting EIO. This is providing > similar functionality for NFS. > > > > > > I do wonder if this ought to be more automatic when you specify > > > -f on > > > the umount. Having to manually do a remount first doesn't seem > > > very > > > admin-friendly. > > > > It's an odd interface. Maybe we could wrap it in something more > > intuitive. > > > > I'd be nervous about making "umount -f" do it. I think > > administrators > > could be unpleasantly surprised in some cases if an "umount -f" > > affects > > other mounts of the same server. > > I was all set to tell you that it already does, but then tested and > found it doesn't and .... > > struct nfs_server (which sb->s_fs_info points to) contains > > struct nfs_client * nfs_client; /* shared client > and NFS4 state */ > > which is shared between different mounts from the same server, and > > struct rpc_clnt * client; /* RPC client > handle */ > > which isn't shared. > struct nfs_client contains > struct rpc_clnt * cl_rpcclient; > > which server->client is clones from. > > The timeouts that apply to a mount are the ones from server->client, > and so apply only to that mount (I thought they were shared, but that > is > a thought from years ago, and maybe it was wrong at the time). > umount_begin aborts all rpcs associated with server->client. > > So the 'remount,retrans=0,timeo=1' that I propose would only affect > the > one superblock (all bind-mounts of course, included sharecache > mounts). > > The comment in my code was wrong. > I by far prefer an operation that changes the superblock state to be done using 'mount -o remount'. The idea of a 'umount -f' that makes the superblock irreversibly change state is clearly not desirable in an environment where the same superblock could be bind mounted and/or mounted in multiple private namespaces. IOW: 'remount,retrans=0,timeo=1' would be slightly preferable to hacking up "umount -f" because it is reversible. That said, there is no reason why we couldn't implement a single mount option that implements something akin to this "umount -f" behaviour (and that can be reversed by resetting it through a second 'remount'). It seems to me that the requested behaviour is already pretty close to what we already implement with the RPC_TASK_SOFTCONN option. -- Trond Myklebust Linux NFS client maintainer, PrimaryData trond.myklebust@xxxxxxxxxxxxxxx ��.n��������+%������w��{.n�����{��w���jg��������ݢj����G�������j:+v���w�m������w�������h�����٥