On Fri, 2017-11-17 at 11:45 -0600, Joshua Watt wrote: > Latest attempt at unifying the various constraints for force > umounting > NFS servers that have disappeared in a timely manner. Namely: > * umount2(..., MNT_FORCE) should not be made stronger, unless we > know > this is the final umount() > * The "failed server" state should be reversible > * The mechanism should be able to "unstick" a sync(2) that is stuck > on > an unresponsive server > > I believe the proposal satisfies all of these concerns. There are a > few > major components to this proposal: > 1) The umount_begin superblock operation now has a corresponding > umount_end operation. This is invoked by umount() when MNT_FORCE > is > specified (like umount_begin()), but the actual unmount failed > (i.e. > the mount is busy). > 2) Instead of killing all the RPC queued at a single point in time, > the > NFS mount now kills all queue RPCs and all RPCs that get queued > between nfs_umount_begin() and nfs_umount_end(). I believe this > is > not a significant change in behavior because there were always > races > between queuing RPCs and killing them in nfs_umount_begin(). > 3) nfs_umount_end() is *not* called when MNT_DETACH is specified. > This > is the indication that the user is done with this mount and all > RPCs will be killed until the mount finally gets removed. > 4) The new "transient" mount option prevents sharing nfs_clients > between multiple superblocks. The only exception to this is when > the > kernel does an automatic mount to cross a device boundary ("xdev" > NFS mount). In this case, the existing code always shares the > existing nfs_client from parent superblock. The "transient" mount > option implies "nosharecache", as it doesn't make sense to share > superblocks if clients aren't shared. > 5) If the "transient" mount option is specified (and hence the > nfs_client is not shared), MNT_FORCE kills all RPCs for the > entire > nfs_client (and all its nfs_servers). This effectively enables > the > "burn down the forest" option when combined with MNT_DETACH. > > The choice to use MNT_FORCE as the mechanism for triggering this > behavior stems from the desire to unstick sync(2) calls that might be > blocked on a non-responsive NFS server. While it was previously > discussed to remount with a new mount option to enable this behavior, > this cannot release the blocked sync(2) call because both > sync_fileystem() and do_remount() lock the struct superblock- > >s_umount > reader-writer lock. As such, a remount will block until the sync(2) > finishes, which is undesirable. umount2() doesn't have this > restriction > and can unblock the sync(2) call. > > For the most part, all existing behavior is unchanged if the > "transient" > option is not specified. umount -f continues to behave as is has, but > umount -fl (see note below) now does a more aggressive kill to get > everything out of there and allow unmounting in a more timely manner. > Note that there will probably be some communication with the server > still, as destruction of the NFS client ID and such will occur when > the > last reference is removed. If the server is truly gone, this can > result > in long blocks at that time. > > If it is known at mount time that the server might be disappearing, > it > should be mounted with "transient". Doing this will allow mount -fl > to > do a more complete cleanup and prevent all communication with the > server, which should allow a timely cleanup in all cases. > > Notes: > > Until recently, libmount did not allow a detached and lazy mount at > the > same time. This was recently fixed (see > https://marc.info/?l=util-linux-ng&m=151000714401929&w=2). If you > want > to test this, you may need to write a simple test program that > directly > calls umount2() with MNT_FORCE | MNT_DETACH. > > Thank you all again for your time and comments, > Joshua Watt > > Joshua Watt (9): > SUNRPC: Add flag to kill new tasks > SUNRPC: Expose kill_new_tasks in debugfs > SUNRPC: Simplify client shutdown > namespace: Add umount_end superblock operation > NFS: Kill RPCs for the duration of umount > NFS: Add debugfs for nfs_server and nfs_client > NFS: Add transient mount option > NFS: Don't shared transient clients > NFS: Kill all client RPCs if transient > > fs/namespace.c | 22 ++++++- > fs/nfs/Makefile | 2 +- > fs/nfs/client.c | 96 +++++++++++++++++++++++++-- > fs/nfs/debugfs.c | 143 > +++++++++++++++++++++++++++++++++++++++++ > fs/nfs/inode.c | 5 ++ > fs/nfs/internal.h | 11 ++++ > fs/nfs/nfs3client.c | 2 + > fs/nfs/nfs4client.c | 5 ++ > fs/nfs/nfs4super.c | 1 + > fs/nfs/super.c | 81 ++++++++++++++++++++--- > include/linux/fs.h | 1 + > include/linux/nfs_fs_sb.h | 6 ++ > include/linux/sunrpc/clnt.h | 1 + > include/uapi/linux/nfs_mount.h | 1 + > net/sunrpc/clnt.c | 13 ++-- > net/sunrpc/debugfs.c | 5 ++ > net/sunrpc/sched.c | 3 + > 17 files changed, 372 insertions(+), 26 deletions(-) > create mode 100644 fs/nfs/debugfs.c > Anyone have any comments on this? Sorry fo the churn, it took a few tries to get this to where it would work. I also realize I should have put all my RFC patchsets in-reply-to each other instead of starting a new thread for each one. Thanks for your time, Joshua Watt -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html