On Thu, 2019-10-03 at 09:01 -0400, Chuck Lever wrote: > > On Oct 2, 2019, at 8:27 PM, NeilBrown <neilb@xxxxxxx> wrote: > > > > On Wed, Oct 02 2019, Chuck Lever wrote: > > > > > Hi Trond- > > > > > > We (Oracle) had another (fairly rare) instance of a weekend > > > maintenance > > > window where an NFS server's IP address changed while there were > > > mounted > > > clients. It brought up the issue again of how we (the Linux NFS > > > community) > > > would like to deal with cases where a client administrator has to > > > deal > > > with a moribund mount (like that alliteration :-). > > > > What exactly is the problem that this caused? > > > > As I understand it, a moribund mount can still be unmounted with "- > > l" > > and processes accessing it can still be killed > > I was asking about "-o remount,soft" because I was not certain > about the outcome last time this conversation was in full swing. > The gist then is that we want "umount -l" and "umount -f" to > work reliably and as advertised? 'umount -l' and 'umount -f' are both inherently flawed. The former because it just hides the hanging RPC calls in the kernel (causing resource leaks left, right and center), and the latter because it is a single point-in-time operation. When you do 'umount -f', it will try to kill all pending RPC calls, but it does nothing to prevent further calls from being scheduled. So yes, at some point it would be good to be able to kill requests from a permanently hanging server through some other means. One of the ideas that I do like, is being able to remount as 'soft' so that the RPC calls simply time out. That solves the problem, and does not compromise the case where the server comes back up, and we remount the super block in order to continue operations. That said, there are a few impediments to making that work. As far as I can tell, none are insurmountable, but they need to be solved. For instance, one such impediment is the fact that the way soft mounts work these days is by tagging each RPC task with the flag RPC_TASK_SOFT (and/or RPC_TASK_TIMEOUT depending on which error value you want the call to return). This tag is set in task->tk_flags, which is assumed constant throughout the lifetime of the RPC task. This is why we can test RPC_IS_SOFT(task) before deciding how we want to call rpc_sleep_on(). If a third party wants to change that tag, and the wake up the task in order to have it try to time out, then code snippets like the following in xprt_reserve_xprt() if (RPC_IS_SOFT(task)) rpc_sleep_on_timeout(&xprt->sending, task, NULL, xprt_request_timeout(req)); else rpc_sleep_on(&xprt->sending, task, NULL); would need to be replaced by something that is atomic. > > > > ... except.... > > There are some waits the VFS/MM which are not TASK_KILLABLE and > > probably should be. I think that "we" definitely want "someone" to > > track them down and fix them. > > I agree... and "someone" could mean me or someone here at Oracle. > > > > > Does remounting with "soft" work today? That seems like the most > > > direct > > > way to deal with this particular situation. > > > > I don't think this does work, and it would be non-trivial (but > > maybe not > > impossible) to mark all the outstanding RPCs as also "soft". > > The problem I've observed with umount is umount_begin does the > killall_tasks call, then the client issues some additional requests. > Those are the requests that get stuck before umount_end can finally > shutdown the RPC client. umount_end is never called because those > requests are "hard". > > We have rpc_killall_tasks which loops over all of an rpc_clnt's > outstanding RPC tasks. nfs_umount_begin could do something like > > - set the rpc_clnt's "soft" flag > - kill all tasks > > Then any new tasks would timeout eventually. Just a thought, maybe > not a good one. > > There's also using SOFTCONN for all tasks after killall is called: > if the client can't reconnect to the server, these tasks would fail > immediately. > > > > If we wanted to follow a path like this (and I suspect we don't), I > > would hope that we could expose the server connection (shared among > > multiple mounts) in sysfs somewhere, and could then set "soft" (or > > "dead") on that connection, rather than having to do it on every > > mount > > from the particular server. > > I think of your use case from last time: client shutdown should be > reliable. Seems like making "umount -f" reliable would be better > for that use case, and would work for the "make client mount points > recoverable after server dies" case too. 'umount -f' is intended as a point in time operation, which is why it is implemented as 'umount_begin' in const struct super_operations nfs_sops. It is not intended to act as a state changing operation on the super block. If it were, it would need to ensure that we also hide such a super block from being found when you try to mount again, and it would need to ensure that you don't inadvertently end up with a surviving duplicate. -- Trond Myklebust Linux NFS client maintainer, Hammerspace trond.myklebust@xxxxxxxxxxxxxxx