On Thu, Jun 3, 2021 at 7:57 PM NeilBrown <neilb@xxxxxxx> wrote: > > On Fri, 04 Jun 2021, Olga Kornievskaia wrote: > > From: Olga Kornievskaia <kolga@xxxxxxxxxx> > > > > When a transport gets stuck, it is desired to be able to move the tasks > > that have been stuck/queued on that transport to another. > > This is interesting..... > A long-standing problem with NFS is that it is tricky to reliably > unmount a filesystem if the network is not responding. It is possible, > but you need to identify all the processes blocked on the filesystem and > SIGKILL them. > My most recent exposure to this was when shutdown hung for someone > because NetworkManager shutdown the wifi before NFS filesystems were > unmounted. This is arguably a config error, but the same problem could > happen with a power-outage instead of networkmanage breaking the wifi. > > It would be nice to be able to forcibly unmount filesystems. e.g. mark > the transport as dead in such a way that all requests report EIO (or > similar). > This is obviously a big hammer, probably bigger than justified for use > with "umount -f", but sometimes it is a necessary hammer. > > Could your work lead to being able to do this? Could I write a shutdown > script that runs when there is no more network and no expectation of any > network ever again, and which marks all transports as dead - and then > wakes up all pending rpc tasks? I thought that was something that Ben was looking into in parallel to my efforts. In this patch series I'm only addressing the issue where some transport is unresponsive and it's not the "main" transport. I don't allow main transport to be put offline or removed. As you said in that case, the tasks need to be errored out to the application. But yes, I think in the next step we can allow for the main transport to be removed and erroring the tasks and allowing for unmounting when the server isn't responding. > > Thanks, > NeilBrown