Re: [RFC PATCH 0/4] nfsd: allow NFSv4 state to be revoked.

Chuck Lever III <chuck.lever@xxxxxxxxxx> · Fri, 28 Jan 2022 13:46:45 +0000

> On Jan 27, 2022, at 11:24 PM, NeilBrown <neilb@xxxxxxx> wrote:
> 
> On Fri, 28 Jan 2022, Chuck Lever III wrote:
>> 
>>> On Jan 27, 2022, at 5:41 PM, NeilBrown <neilb@xxxxxxx> wrote:
>>> 
>>> On Fri, 28 Jan 2022, Chuck Lever III wrote:
>>>> Hi Neil-
>>>> 
>>>>> On Jan 26, 2022, at 11:58 PM, NeilBrown <neilb@xxxxxxx> wrote:
>>>>> 
>>>>> If a filesystem is exported to a client with NFSv4 and that client holds
>>>>> a file open, the filesystem cannot be unmounted without either stopping the
>>>>> NFS server completely, or blocking all access from that client
>>>>> (unexporting all filesystems) and waiting for the lease timeout.
>>>>> 
>>>>> For NFSv3 - and particularly NLM - it is possible to revoke all state by
>>>>> writing the path to the filesystem into /proc/fs/nfsd/unlock_filesystem.
>>>>> 
>>>>> This series extends this functionality to NFSv4.  With this, to unmount
>>>>> an exported filesystem is it sufficient to disable export of that
>>>>> filesystem, and then write the path to unlock_filesystem.
>>>>> 
>>>>> I've cursed mainly on NFSv4.1 and later for this.  I haven't tested
>>>>> yet with NFSv4.0 which has different mechanisms for state management.
>>>>> 
>>>>> If this series is seen as a general acceptable approach, I'll look into
>>>>> the NFSv4.0 aspects properly and make sure it works there.
>>>> 
>>>> I've browsed this series and need to think about:
>>>> - whether we want to enable administrative state revocation and
>>>> - whether NFSv4.0 can support that reasonably
>>>> 
>>>> In particular, are there security consequences for revoking
>>>> state? What would applications see, and would that depend on
>>>> which minor version is in use? Are there data corruption risks
>>>> if this facility were to be misused?
>>> 
>>> The expectation is that this would only be used after unexporting the
>>> filesystem.  In that case, the client wouldn't notice any difference
>>> from the act of writing to unlock_filesystem, as the entire filesystem
>>> would already be inaccessible.
>>> 
>>> If we did unlock_filesystem a filesystem that was still exported, the
>>> client would see similar behaviour to a network partition that was of
>>> longer duration than the lease time.   Locks would be lost.
>>> 
>>>> 
>>>> Also, Dai's courteous server work is something that potentially
>>>> conflicts with some of this, and I'd like to see that go in
>>>> first.
>>> 
>>> I'm perfectly happy to wait for the courteous server work to land before
>>> pursuing this.
>>> 
>>>> 
>>>> Do you have specific user requests for this feature, and if so,
>>>> what are the particular usage scenarios?
>>> 
>>> It's complicated....
>>> 
>>> The customer has an HA config with multiple filesystem resource which
>>> they want to be able to migrate independently.  I don't think we really
>>> support that,
>> 
>> With NFSv4, the protocol has mechanisms to indicate to clients that
>> a shared filesystem has migrated, and to indicate that the clients'
>> state has been migrated too. Clients can reclaim their state if the
>> servers did not migrate that state with the data. It deals with the
>> edge cases to prevent clients from stealing open/lock state during
>> the migration.
>> 
>> Unexporting doesn't seem like the right approach to that.
> 
> No, but it something that should work, and should allow the filesystem
> to be unmounted.  You get to keep both halves.
> 
>> 
>> 
>>> but they seem to want to see if they can make it work (and
>>> it should be noted that I talk to an L2 support technician who talks to
>>> the customer representative, so I might be getting the full story).
>>> 
>>> Customer reported that even after unexporting a filesystem, they cannot
>>> then unmount it.
>> 
>> My first thought is that probably clients are still pinning
>> resources on that shared filesystem. I guess that's what the
>> unlock_ interface is supposed to deal with. But that suggests
>> to me that unexporting first is not as risk-free as you
>> describe above. I think applications would notice and there
>> would be edge cases where other clients might be able to
>> grab open/lock state before the original holders could
>> re-establish their lease.
> 
> Unexporting isn't risk free.  It just absorbs all the risks - none are
> left of unlock_filesystem to be blamed for.
> 
> Expecting an application to recover if you unexport a filesystem and
> later re-export it is certainly not guaranteed.  That isn't the use-case
> I particularly want to fix.  I want to be able to unmount a filesystem
> without visiting call clients and killing off applications.

OK. The top level goal then is simply to provide another
arrow in the administrator's quiver to manage a large
NFS server. It brings NFSv4 closer to par with the NFSv3
toolset.

I say we have enough motivation for a full proof of
concept. I would like to see support for minor version 0
added, and a fuller discussion of the consequences for
clients and applications will be needed (at least for
the purpose of administrator documentation).

>>> Whether or not we think that independent filesystem
>>> resources is supportable, I do think that the customer should have a
>>> clear path for unmounting a filesystem without interfering with service
>>> provided from other filesystems.
>> 
>> Maybe. I guess I put that in the "last resort" category
>> rather than "this is something safe that I want to do as
>> part of daily operation" category.
> 
> Agree.  Definitely "last resort".
> 
>> 
>> 
>>> Stopping nfsd would interfere with
>>> that service by forcing a grace-period on all filesystems.
>> 
>> Yep. We have discussed implementing a per-filesystem
>> grace period in the past. That is probably a pre-requisite
>> to enabling filesystem migration.
>> 
>> 
>>> The RFC explicitly supports admin-revocation of state, and that would
>>> address this specific need, so it seemed completely appropriate to
>>> provide it.
>> 
>> Well the RFC also provides for migrating filesystems without
>> stopping the NFS service. If that's truly the goal, then I
>> think we want to encourage that direction instead of ripping
>> out open and lock state.
> 
> I suspect that virtual IPs and network namespaces is the better approach
> for migrating exported filesystems.  It isn't clear to me that
> integrated migration support in NFS would add anything of value.
> 
> But as I think I said to Bruce - seamless migration support is not my
> goal here.  In the context where a site has multiple filesystems that
> are all NFS exported, there is a case for being able to forcibly
> unexport/unmount one filesystem without affecting the others.  That is
> my aim here.

My initial impulse is to better understand what is preventing
the unexported filesystem from being unmounted. Better
observability there could potentially be of value.

> Thanks,
> NeilBrown
> 
> 
>> 
>> Also, it's not clear to me that clients support administrative
>> revocation as broadly as we might like. The Linux NFS client
>> does have support for NFSv4 migration, though it's a bit
>> fallow these days.
>> 
>> 
>>> As an aside ...  I'd like to be able to suggest that the customer use
>>> network namespaces for the different filesystem resources.  Each could
>>> be in its own namespace and managed independently.  However I don't
>>> think we have good admin infrastructure for that do we?
>> 
>> None that I'm aware of. SteveD is the best person to ask.
>> 
>> 
>>> I'd like to be able to say "set up these 2 or 3 config files and run 
>>> systemctl start nfs-server@foo and the 'foo' network namespace will be
>>> created, configured, and have an nfs server running".
>>> Do we have anything approaching that?  Even a HOWTO ??
>> 
>> Interesting idea! But doesn't ring a bell.
>> 
>> --
>> Chuck Lever

--
Chuck Lever