> On Oct 2, 2019, at 8:27 PM, NeilBrown <neilb@xxxxxxx> wrote: > > On Wed, Oct 02 2019, Chuck Lever wrote: > >> Hi Trond- >> >> We (Oracle) had another (fairly rare) instance of a weekend maintenance >> window where an NFS server's IP address changed while there were mounted >> clients. It brought up the issue again of how we (the Linux NFS community) >> would like to deal with cases where a client administrator has to deal >> with a moribund mount (like that alliteration :-). > > What exactly is the problem that this caused? > > As I understand it, a moribund mount can still be unmounted with "-l" > and processes accessing it can still be killed I was asking about "-o remount,soft" because I was not certain about the outcome last time this conversation was in full swing. The gist then is that we want "umount -l" and "umount -f" to work reliably and as advertised? > ... except.... > There are some waits the VFS/MM which are not TASK_KILLABLE and > probably should be. I think that "we" definitely want "someone" to > track them down and fix them. I agree... and "someone" could mean me or someone here at Oracle. >> Does remounting with "soft" work today? That seems like the most direct >> way to deal with this particular situation. > > I don't think this does work, and it would be non-trivial (but maybe not > impossible) to mark all the outstanding RPCs as also "soft". The problem I've observed with umount is umount_begin does the killall_tasks call, then the client issues some additional requests. Those are the requests that get stuck before umount_end can finally shutdown the RPC client. umount_end is never called because those requests are "hard". We have rpc_killall_tasks which loops over all of an rpc_clnt's outstanding RPC tasks. nfs_umount_begin could do something like - set the rpc_clnt's "soft" flag - kill all tasks Then any new tasks would timeout eventually. Just a thought, maybe not a good one. There's also using SOFTCONN for all tasks after killall is called: if the client can't reconnect to the server, these tasks would fail immediately. > If we wanted to follow a path like this (and I suspect we don't), I > would hope that we could expose the server connection (shared among > multiple mounts) in sysfs somewhere, and could then set "soft" (or > "dead") on that connection, rather than having to do it on every mount > from the particular server. I think of your use case from last time: client shutdown should be reliable. Seems like making "umount -f" reliable would be better for that use case, and would work for the "make client mount points recoverable after server dies" case too. -- Chuck Lever