On Fri, 2022-01-28 at 15:24 +1100, NeilBrown wrote: > On Fri, 28 Jan 2022, Chuck Lever III wrote: > > > > > On Jan 27, 2022, at 5:41 PM, NeilBrown <neilb@xxxxxxx> wrote: > > > > > > On Fri, 28 Jan 2022, Chuck Lever III wrote: > > > > Hi Neil- > > > > > > > > > On Jan 26, 2022, at 11:58 PM, NeilBrown <neilb@xxxxxxx> > > > > > wrote: > > > > > > > > > > If a filesystem is exported to a client with NFSv4 and that > > > > > client holds > > > > > a file open, the filesystem cannot be unmounted without > > > > > either stopping the > > > > > NFS server completely, or blocking all access from that > > > > > client > > > > > (unexporting all filesystems) and waiting for the lease > > > > > timeout. > > > > > > > > > > For NFSv3 - and particularly NLM - it is possible to revoke > > > > > all state by > > > > > writing the path to the filesystem into > > > > > /proc/fs/nfsd/unlock_filesystem. > > > > > > > > > > This series extends this functionality to NFSv4. With this, > > > > > to unmount > > > > > an exported filesystem is it sufficient to disable export of > > > > > that > > > > > filesystem, and then write the path to unlock_filesystem. > > > > > > > > > > I've cursed mainly on NFSv4.1 and later for this. I haven't > > > > > tested > > > > > yet with NFSv4.0 which has different mechanisms for state > > > > > management. > > > > > > > > > > If this series is seen as a general acceptable approach, I'll > > > > > look into > > > > > the NFSv4.0 aspects properly and make sure it works there. > > > > > > > > I've browsed this series and need to think about: > > > > - whether we want to enable administrative state revocation and > > > > - whether NFSv4.0 can support that reasonably > > > > > > > > In particular, are there security consequences for revoking > > > > state? What would applications see, and would that depend on > > > > which minor version is in use? Are there data corruption risks > > > > if this facility were to be misused? > > > > > > The expectation is that this would only be used after unexporting > > > the > > > filesystem. In that case, the client wouldn't notice any > > > difference > > > from the act of writing to unlock_filesystem, as the entire > > > filesystem > > > would already be inaccessible. > > > > > > If we did unlock_filesystem a filesystem that was still exported, > > > the > > > client would see similar behaviour to a network partition that > > > was of > > > longer duration than the lease time. Locks would be lost. > > > > > > > > > > > Also, Dai's courteous server work is something that potentially > > > > conflicts with some of this, and I'd like to see that go in > > > > first. > > > > > > I'm perfectly happy to wait for the courteous server work to land > > > before > > > pursuing this. > > > > > > > > > > > Do you have specific user requests for this feature, and if so, > > > > what are the particular usage scenarios? > > > > > > It's complicated.... > > > > > > The customer has an HA config with multiple filesystem resource > > > which > > > they want to be able to migrate independently. I don't think we > > > really > > > support that, > > > > With NFSv4, the protocol has mechanisms to indicate to clients that > > a shared filesystem has migrated, and to indicate that the clients' > > state has been migrated too. Clients can reclaim their state if the > > servers did not migrate that state with the data. It deals with the > > edge cases to prevent clients from stealing open/lock state during > > the migration. > > > > Unexporting doesn't seem like the right approach to that. > > No, but it something that should work, and should allow the > filesystem > to be unmounted. You get to keep both halves. > > > > > > > > but they seem to want to see if they can make it work (and > > > it should be noted that I talk to an L2 support technician who > > > talks to > > > the customer representative, so I might be getting the full > > > story). > > > > > > Customer reported that even after unexporting a filesystem, they > > > cannot > > > then unmount it. > > > > My first thought is that probably clients are still pinning > > resources on that shared filesystem. I guess that's what the > > unlock_ interface is supposed to deal with. But that suggests > > to me that unexporting first is not as risk-free as you > > describe above. I think applications would notice and there > > would be edge cases where other clients might be able to > > grab open/lock state before the original holders could > > re-establish their lease. > > Unexporting isn't risk free. It just absorbs all the risks - none > are > left of unlock_filesystem to be blamed for. > > Expecting an application to recover if you unexport a filesystem and > later re-export it is certainly not guaranteed. That isn't the use- > case > I particularly want to fix. I want to be able to unmount a > filesystem > without visiting call clients and killing off applications. > > > > > > > > Whether or not we think that independent filesystem > > > resources is supportable, I do think that the customer should > > > have a > > > clear path for unmounting a filesystem without interfering with > > > service > > > provided from other filesystems. > > > > Maybe. I guess I put that in the "last resort" category > > rather than "this is something safe that I want to do as > > part of daily operation" category. > > Agree. Definitely "last resort". > > > > > > > > Stopping nfsd would interfere with > > > that service by forcing a grace-period on all filesystems. > > > > Yep. We have discussed implementing a per-filesystem > > grace period in the past. That is probably a pre-requisite > > to enabling filesystem migration. > > > > > > > The RFC explicitly supports admin-revocation of state, and that > > > would > > > address this specific need, so it seemed completely appropriate > > > to > > > provide it. > > > > Well the RFC also provides for migrating filesystems without > > stopping the NFS service. If that's truly the goal, then I > > think we want to encourage that direction instead of ripping > > out open and lock state. > > I suspect that virtual IPs and network namespaces is the better > approach > for migrating exported filesystems. It isn't clear to me that > integrated migration support in NFS would add anything of value. No, but referrals allow you to create an arbitrary namespace out of a set of containerised knfsd instances. It really wouldn't be hard to convert an existing setup into something that gives you the single- filesystem migration capabilities you're asking for. > -- Trond Myklebust Linux NFS client maintainer, Hammerspace trond.myklebust@xxxxxxxxxxxxxxx