Re: [RFC PATCH 0/4] nfsd: allow NFSv4 state to be revoked.

Trond Myklebust <trondmy@xxxxxxxxxxxxxxx> · Fri, 28 Jan 2022 04:35:26 +0000

On Fri, 2022-01-28 at 15:24 +1100, NeilBrown wrote:
> On Fri, 28 Jan 2022, Chuck Lever III wrote:
> > 
> > > On Jan 27, 2022, at 5:41 PM, NeilBrown <neilb@xxxxxxx> wrote:
> > > 
> > > On Fri, 28 Jan 2022, Chuck Lever III wrote:
> > > > Hi Neil-
> > > > 
> > > > > On Jan 26, 2022, at 11:58 PM, NeilBrown <neilb@xxxxxxx>
> > > > > wrote:
> > > > > 
> > > > > If a filesystem is exported to a client with NFSv4 and that
> > > > > client holds
> > > > > a file open, the filesystem cannot be unmounted without
> > > > > either stopping the
> > > > > NFS server completely, or blocking all access from that
> > > > > client
> > > > > (unexporting all filesystems) and waiting for the lease
> > > > > timeout.
> > > > > 
> > > > > For NFSv3 - and particularly NLM - it is possible to revoke
> > > > > all state by
> > > > > writing the path to the filesystem into
> > > > > /proc/fs/nfsd/unlock_filesystem.
> > > > > 
> > > > > This series extends this functionality to NFSv4.  With this,
> > > > > to unmount
> > > > > an exported filesystem is it sufficient to disable export of
> > > > > that
> > > > > filesystem, and then write the path to unlock_filesystem.
> > > > > 
> > > > > I've cursed mainly on NFSv4.1 and later for this.  I haven't
> > > > > tested
> > > > > yet with NFSv4.0 which has different mechanisms for state
> > > > > management.
> > > > > 
> > > > > If this series is seen as a general acceptable approach, I'll
> > > > > look into
> > > > > the NFSv4.0 aspects properly and make sure it works there.
> > > > 
> > > > I've browsed this series and need to think about:
> > > > - whether we want to enable administrative state revocation and
> > > > - whether NFSv4.0 can support that reasonably
> > > > 
> > > > In particular, are there security consequences for revoking
> > > > state? What would applications see, and would that depend on
> > > > which minor version is in use? Are there data corruption risks
> > > > if this facility were to be misused?
> > > 
> > > The expectation is that this would only be used after unexporting
> > > the
> > > filesystem.  In that case, the client wouldn't notice any
> > > difference
> > > from the act of writing to unlock_filesystem, as the entire
> > > filesystem
> > > would already be inaccessible.
> > > 
> > > If we did unlock_filesystem a filesystem that was still exported,
> > > the
> > > client would see similar behaviour to a network partition that
> > > was of
> > > longer duration than the lease time.   Locks would be lost.
> > > 
> > > > 
> > > > Also, Dai's courteous server work is something that potentially
> > > > conflicts with some of this, and I'd like to see that go in
> > > > first.
> > > 
> > > I'm perfectly happy to wait for the courteous server work to land
> > > before
> > > pursuing this.
> > > 
> > > > 
> > > > Do you have specific user requests for this feature, and if so,
> > > > what are the particular usage scenarios?
> > > 
> > > It's complicated....
> > > 
> > > The customer has an HA config with multiple filesystem resource
> > > which
> > > they want to be able to migrate independently.  I don't think we
> > > really
> > > support that,
> > 
> > With NFSv4, the protocol has mechanisms to indicate to clients that
> > a shared filesystem has migrated, and to indicate that the clients'
> > state has been migrated too. Clients can reclaim their state if the
> > servers did not migrate that state with the data. It deals with the
> > edge cases to prevent clients from stealing open/lock state during
> > the migration.
> > 
> > Unexporting doesn't seem like the right approach to that.
> 
> No, but it something that should work, and should allow the
> filesystem
> to be unmounted.  You get to keep both halves.
> 
> > 
> > 
> > > but they seem to want to see if they can make it work (and
> > > it should be noted that I talk to an L2 support technician who
> > > talks to
> > > the customer representative, so I might be getting the full
> > > story).
> > > 
> > > Customer reported that even after unexporting a filesystem, they
> > > cannot
> > > then unmount it.
> > 
> > My first thought is that probably clients are still pinning
> > resources on that shared filesystem. I guess that's what the
> > unlock_ interface is supposed to deal with. But that suggests
> > to me that unexporting first is not as risk-free as you
> > describe above. I think applications would notice and there
> > would be edge cases where other clients might be able to
> > grab open/lock state before the original holders could
> > re-establish their lease.
> 
> Unexporting isn't risk free.  It just absorbs all the risks - none
> are
> left of unlock_filesystem to be blamed for.
> 
> Expecting an application to recover if you unexport a filesystem and
> later re-export it is certainly not guaranteed.  That isn't the use-
> case
> I particularly want to fix.  I want to be able to unmount a
> filesystem
> without visiting call clients and killing off applications.
> 
> > 
> > 
> > > Whether or not we think that independent filesystem
> > > resources is supportable, I do think that the customer should
> > > have a
> > > clear path for unmounting a filesystem without interfering with
> > > service
> > > provided from other filesystems.
> > 
> > Maybe. I guess I put that in the "last resort" category
> > rather than "this is something safe that I want to do as
> > part of daily operation" category.
> 
> Agree.  Definitely "last resort".
> 
> > 
> > 
> > > Stopping nfsd would interfere with
> > > that service by forcing a grace-period on all filesystems.
> > 
> > Yep. We have discussed implementing a per-filesystem
> > grace period in the past. That is probably a pre-requisite
> > to enabling filesystem migration.
> > 
> > 
> > > The RFC explicitly supports admin-revocation of state, and that
> > > would
> > > address this specific need, so it seemed completely appropriate
> > > to
> > > provide it.
> > 
> > Well the RFC also provides for migrating filesystems without
> > stopping the NFS service. If that's truly the goal, then I
> > think we want to encourage that direction instead of ripping
> > out open and lock state.
> 
> I suspect that virtual IPs and network namespaces is the better
> approach
> for migrating exported filesystems.  It isn't clear to me that
> integrated migration support in NFS would add anything of value.

No, but referrals allow you to create an arbitrary namespace out of a
set of containerised knfsd instances. It really wouldn't be hard to
convert an existing setup into something that gives you the single-
filesystem migration capabilities you're asking for.

> 

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@xxxxxxxxxxxxxxx