> On Jan 27, 2022, at 11:24 PM, NeilBrown <neilb@xxxxxxx> wrote: > > On Fri, 28 Jan 2022, Chuck Lever III wrote: >> >>> On Jan 27, 2022, at 5:41 PM, NeilBrown <neilb@xxxxxxx> wrote: >>> >>> On Fri, 28 Jan 2022, Chuck Lever III wrote: >>>> Hi Neil- >>>> >>>>> On Jan 26, 2022, at 11:58 PM, NeilBrown <neilb@xxxxxxx> wrote: >>>>> >>>>> If a filesystem is exported to a client with NFSv4 and that client holds >>>>> a file open, the filesystem cannot be unmounted without either stopping the >>>>> NFS server completely, or blocking all access from that client >>>>> (unexporting all filesystems) and waiting for the lease timeout. >>>>> >>>>> For NFSv3 - and particularly NLM - it is possible to revoke all state by >>>>> writing the path to the filesystem into /proc/fs/nfsd/unlock_filesystem. >>>>> >>>>> This series extends this functionality to NFSv4. With this, to unmount >>>>> an exported filesystem is it sufficient to disable export of that >>>>> filesystem, and then write the path to unlock_filesystem. >>>>> >>>>> I've cursed mainly on NFSv4.1 and later for this. I haven't tested >>>>> yet with NFSv4.0 which has different mechanisms for state management. >>>>> >>>>> If this series is seen as a general acceptable approach, I'll look into >>>>> the NFSv4.0 aspects properly and make sure it works there. >>>> >>>> I've browsed this series and need to think about: >>>> - whether we want to enable administrative state revocation and >>>> - whether NFSv4.0 can support that reasonably >>>> >>>> In particular, are there security consequences for revoking >>>> state? What would applications see, and would that depend on >>>> which minor version is in use? Are there data corruption risks >>>> if this facility were to be misused? >>> >>> The expectation is that this would only be used after unexporting the >>> filesystem. In that case, the client wouldn't notice any difference >>> from the act of writing to unlock_filesystem, as the entire filesystem >>> would already be inaccessible. >>> >>> If we did unlock_filesystem a filesystem that was still exported, the >>> client would see similar behaviour to a network partition that was of >>> longer duration than the lease time. Locks would be lost. >>> >>>> >>>> Also, Dai's courteous server work is something that potentially >>>> conflicts with some of this, and I'd like to see that go in >>>> first. >>> >>> I'm perfectly happy to wait for the courteous server work to land before >>> pursuing this. >>> >>>> >>>> Do you have specific user requests for this feature, and if so, >>>> what are the particular usage scenarios? >>> >>> It's complicated.... >>> >>> The customer has an HA config with multiple filesystem resource which >>> they want to be able to migrate independently. I don't think we really >>> support that, >> >> With NFSv4, the protocol has mechanisms to indicate to clients that >> a shared filesystem has migrated, and to indicate that the clients' >> state has been migrated too. Clients can reclaim their state if the >> servers did not migrate that state with the data. It deals with the >> edge cases to prevent clients from stealing open/lock state during >> the migration. >> >> Unexporting doesn't seem like the right approach to that. > > No, but it something that should work, and should allow the filesystem > to be unmounted. You get to keep both halves. > >> >> >>> but they seem to want to see if they can make it work (and >>> it should be noted that I talk to an L2 support technician who talks to >>> the customer representative, so I might be getting the full story). >>> >>> Customer reported that even after unexporting a filesystem, they cannot >>> then unmount it. >> >> My first thought is that probably clients are still pinning >> resources on that shared filesystem. I guess that's what the >> unlock_ interface is supposed to deal with. But that suggests >> to me that unexporting first is not as risk-free as you >> describe above. I think applications would notice and there >> would be edge cases where other clients might be able to >> grab open/lock state before the original holders could >> re-establish their lease. > > Unexporting isn't risk free. It just absorbs all the risks - none are > left of unlock_filesystem to be blamed for. > > Expecting an application to recover if you unexport a filesystem and > later re-export it is certainly not guaranteed. That isn't the use-case > I particularly want to fix. I want to be able to unmount a filesystem > without visiting call clients and killing off applications. OK. The top level goal then is simply to provide another arrow in the administrator's quiver to manage a large NFS server. It brings NFSv4 closer to par with the NFSv3 toolset. I say we have enough motivation for a full proof of concept. I would like to see support for minor version 0 added, and a fuller discussion of the consequences for clients and applications will be needed (at least for the purpose of administrator documentation). >>> Whether or not we think that independent filesystem >>> resources is supportable, I do think that the customer should have a >>> clear path for unmounting a filesystem without interfering with service >>> provided from other filesystems. >> >> Maybe. I guess I put that in the "last resort" category >> rather than "this is something safe that I want to do as >> part of daily operation" category. > > Agree. Definitely "last resort". > >> >> >>> Stopping nfsd would interfere with >>> that service by forcing a grace-period on all filesystems. >> >> Yep. We have discussed implementing a per-filesystem >> grace period in the past. That is probably a pre-requisite >> to enabling filesystem migration. >> >> >>> The RFC explicitly supports admin-revocation of state, and that would >>> address this specific need, so it seemed completely appropriate to >>> provide it. >> >> Well the RFC also provides for migrating filesystems without >> stopping the NFS service. If that's truly the goal, then I >> think we want to encourage that direction instead of ripping >> out open and lock state. > > I suspect that virtual IPs and network namespaces is the better approach > for migrating exported filesystems. It isn't clear to me that > integrated migration support in NFS would add anything of value. > > But as I think I said to Bruce - seamless migration support is not my > goal here. In the context where a site has multiple filesystems that > are all NFS exported, there is a case for being able to forcibly > unexport/unmount one filesystem without affecting the others. That is > my aim here. My initial impulse is to better understand what is preventing the unexported filesystem from being unmounted. Better observability there could potentially be of value. > Thanks, > NeilBrown > > >> >> Also, it's not clear to me that clients support administrative >> revocation as broadly as we might like. The Linux NFS client >> does have support for NFSv4 migration, though it's a bit >> fallow these days. >> >> >>> As an aside ... I'd like to be able to suggest that the customer use >>> network namespaces for the different filesystem resources. Each could >>> be in its own namespace and managed independently. However I don't >>> think we have good admin infrastructure for that do we? >> >> None that I'm aware of. SteveD is the best person to ask. >> >> >>> I'd like to be able to say "set up these 2 or 3 config files and run >>> systemctl start nfs-server@foo and the 'foo' network namespace will be >>> created, configured, and have an nfs server running". >>> Do we have anything approaching that? Even a HOWTO ?? >> >> Interesting idea! But doesn't ring a bell. >> >> -- >> Chuck Lever -- Chuck Lever