Re: [PATCH v2] nfs.man: document requirements for NFSv4 identity

"NeilBrown" <neilb@xxxxxxx> · Tue, 15 Mar 2022 11:41:28 +1100

On Tue, 15 Mar 2022, Chuck Lever III wrote:
> Hi Neil-
> 
> > On Mar 13, 2022, at 9:04 PM, NeilBrown <neilb@xxxxxxx> wrote:
> > 
> > 
> > When mounting NFS filesystem in a network namespace using v4, some care
> > must be taken to ensure a unique and stable client identity.  Similar
> > case is needed for NFS-root and other situations.
> > 
> > Add documentation explaining the requirements for the NFS identity in
> > these situations.
> > 
> > Signed-off-by: NeilBrown <neilb@xxxxxxx>
> > ---
> > 
> > I think I've address most of the feedback, but please forgive and remind
> > if I missed something.
> > NeilBrown
> > 
> > utils/mount/nfs.man | 109 +++++++++++++++++++++++++++++++++++++++++++-
> > 1 file changed, 108 insertions(+), 1 deletion(-)
> > 
> > diff --git a/utils/mount/nfs.man b/utils/mount/nfs.man
> > index d9f34df36b42..5f15abe8cf72 100644
> > --- a/utils/mount/nfs.man
> > +++ b/utils/mount/nfs.man
> > @@ -1,7 +1,7 @@
> > .\"@(#)nfs.5"
> > .TH NFS 5 "9 October 2012"
> > .SH NAME
> > -nfs \- fstab format and options for the
> > +nfs \- fstab format and configuration for the
> > .B nfs
> > file systems
> 
> Suggest "configuration for nfs file systems" (remove "the")

Agreed.

> 
> 
> > .SH SYNOPSIS
> > @@ -1844,6 +1844,113 @@ export pathname, but not both, during a remount.  For example,
> > merges the mount option
> > .B ro
> > with the mount options already saved on disk for the NFS server mounted at /mnt.
> > +.SH "NFS CLIENT IDENTIFIER"
> > +NFSv4 requires that the client present a unique identifier to the server
> > +to be used to track state such as file locks.  By default Linux NFS uses
> > +the host name, as configured at the time of the first NFS mount,
> > +together with some fixed content such as the name "Linux NFS" and the
> > +particular protocol version.  When the hostname is guaranteed to be
> > +unique among all client which access the same server this is sufficient.
> > +If hostname uniqueness cannot be assumed, extra identity information
> > +must be provided.
> 
> The last sentence is made ambiguous by the use of passive voice.
> 
> Suggest: "When hostname uniqueness cannot be guaranteed, the client
> administrator must provide extra identity information."

Why must the client administrator do this?  Why can't some automated
tool do this?  Or some container-building environment.
That's an advantage of the passive voice, you don't need to assign
responsibility for the verb.

> 
> I have a problem with basing our default uniqueness guarantee on
> hostnames "most of the time" hoping it will all work out. There
> are simply too many common cases where hostname stability can't be
> relied upon. Our sustaining teams will happily tell us this hope
> hasn't so far been born out.

Maybe it has not been born out because there is no documented
requirement for it that we can point people to.
Clearly containers that use NFS are not currently all configured well to do
this.  Some change is needed.  Maybe adding a unique host name is the
easiest change ... or maybe not.

Surely NFS is not the *only* service that uses the host name.
Encouraging the use of unique host names might benefit others.

The practical reality is that a great many NFS client installations do
currently depend on unique host names - after all, it actually works.
Is it really so unreasonable to try to encourage the exceptions to fit
the common pattern better?

> 
> I also don't feel that nfs(5) is an appropriate place for this level
> of detail. Documentation/filesystems/nfs/ is more appropriate IMO.
> In general, man pages are good for quick summaries, not for
> explainers. Here, it reads like "you, a user, are going to have to
> do this thing that is like filling out a tax form" -- in reality it
> should be information that should be:
> 
>  - Ignorable by most folks
>  - Used by distributors to add value by automating set up
>  - Used for debugging large client installations

nfs(5) contains sections on TRANSPORT METHODS, DATA AND METADATA
COHERENCE, SECURITY CONSIDERATIONS.  Is this section really out of
place? 

I could agree that all of these sections belong in "section 7" (Overview,
conventions, and miscellaneous) rather than "section 5" (File formats and
configuration files) but we don't have nfs.7 (yet).  I think section 7
is a reasonable fit for your 3 points above.

I don't agree that Documentation/filesystems/nfs/ is sufficient.  That
is (from my perspective) primarily of interest to kernel developers.
The whole point of this exercise that at we need to reach people outside
of that group.

> 
> Maybe I'm just stating this to understand the purpose of this
> patch, but it could also be used as an "Intended audience"
> disclaimer in this new section.

OK, so the "purpose of this patch" relates in part to a comment you made
earlier, which I include here:

> Since it is just a line or two of code, it might be of little
> harm just to go with separate implementations for now and stop
> talking about it. If it sucks, we can fix the suckage.
> 
> Who volunteers to implement this mechanism in mount.nfs ?

I don't think this is the best next step.  I think we need to get some
container system developer to contribute here.  So far we only have
second hand anecdotes about problems.  I think the most concrete is from
Ben suggesting that in at least one container system, using
/etc/machine-id is a good idea.

I don't think we can change nfs-utils (whether mount.nfs or mount.conf
or some other way) to set identity from /etc/machine-id for everyone.
So we need at least for that container system to request that change.

How would they like to do that?

I suggest that we explain the problem to representatives of the various
container communities that we have contact with (Well...  "you", more
than "we" as I don't have contacts).

We could use the documentation I provided to clearly present the
problem.
Then ask:
  - would you like to just run some shell code (see examples)
  - or would you like to provide an /etc/nfs.conf.d/my-container.conf
  - or would you like to run a tool that we provide
  - or is there already a push to provide unique container hostnames,
    and is this the incentive you need to help that push across the
    line?

If we have someone from $CONTAINER_COMMUNITY say "if you do this thing,
then we will use it", then that would be hard to argue with.
If we could get two or three different communities to comment, I expect
the best answer would become a lot more obvious.

But first we, ourselves, need to agree on the document :-)

> 
> 
> > +.PP
> > +Some situations which are known to be problematic with respect to unique
> > +host names include:
> 
> A little wordy.
> 
> Suggest: "Situations known to be problematic with respect to unique
> hostnames include:"

Yep.

> 
> If this will eventually become part of nfs(5), I would first run
> this patch by documentation experts, because they might have a
> preference for "hostnames" over "host names" and "namespaces" over
> "name-spaces". Usage of these terms throughout this patch is not
> consistent.

I've made it consistently "hostname" and "namespace" which is consistent
with the rest of the document

> 
> 
> > +.IP \- 2
> > +NFS-root (diskless) clients, where the DCHP server (or equivalent) does
> > +not provide a unique host name.
> 
> Suggest this addition:
> 
> .IP \- 2
> 
> Dynamically-assigned hostnames, where the hostname can be changed after
> a client reboot, while the client is booted, or if a client often 
> repeatedly connects to multiple networks (for example if it is moved
> from home to an office every day).

This is a different kettle of fish.  The hostname is *always* included
in the identifier.  If it isn't stable, then the identifier isn't
stable.

I saw in the history that when you introduced the module parameter it
replaced the hostname.  This caused problems in containers (which had
different host names) so Trond changed it so the module parameter
supplemented the hostname.

If hostnames are really so poorly behaved I can see there might be a
case to suppress the hostname, but we don't have that option is current
kernels.  Should we add it?

> 
> 
> > +.IP \- 2
> > +"containers" within a single Linux host.  If each container has a separate
> > +network namespace, but does not use the UTS namespace to provide a unique
> > +host name, then there can be multiple effective NFS clients with the
> > +same host name.
> > +.IP \= 2
> 
> .IP \- 2

Thanks.

> 
> 
> > +Clients across multiple administrative domains that access a common NFS
> > +server.  If assignment of host name is devolved to separate domains,
> 
> I don't recognize the phrase "assignment is devolved to separate domains".
> Can you choose a friendlier way of saying this?
> 

 If hostnames are not assigned centrally then uniqueness cannot be
 guaranteed unless a domain name is included in the hostname.

> 
> > +uniqueness cannot be guaranteed, unless a domain name is included in the
> > +host name.
> > +.SS "Increasing Client Uniqueness"
> > +Apart from the host name, which is the preferred way to differentiate
> > +NFS clients, there are two mechanisms to add uniqueness to the
> > +client identifier.
> > +.TP
> > +.B nfs.nfs4_unique_id
> > +This module parameter can be set to an arbitrary string at boot time, or
> > +when the 
> > +.B nfs
> > +module is loaded.  This might be suitable for configuring diskless clients.
> 
> Suggest: "This is suitable for"

OK

> 
> 
> > +.TP
> > +.B /sys/fs/nfs/client/net/identifier
> > +This virtual file (available since Linux 5.3) is local to the network
> > +name-space in which it is accessed and so can provided uniqueness between
> > +network namespaces (containers) when the hostname remains uniform.
> 
> ^provided^provide
> 
> ^between^amongst
> 
> and the clause at the end confused me.
> 
> Suggest: "in which it is accessed and thus can provide uniqueness
> amongst network namespaces (containers)."

The clause at the end was simply emphasising that the identifer is only
needed if the hostname does not vary across containers.  I have removed it.

> 
> 
> > +.RS
> > +.PP
> > +This value is empty on name-space creation.
> > +If the value is to be set, that should be done before the first
> > +mount.  If the container system has access to some sort of per-container
> > +identity then that identity, possibly obfuscated as a UUID is privacy is
> > +needed, can be used.  Combining the identity with the name of the
> > +container systems would also help.
> 
> I object to recommending obfuscation via a UUID.
> 
> 1. This is confusing because there has been no mention of any
>    persistence requirement so far. At this point, a reader
>    might think that the client can simply convert the hostname
>    and netns identifier every time it boots. However this is
>    only OK to do if these things are guaranteed not to change
>    during the lifetime of a client. In a world where a majority
>    of systems get their hostnames dynamically, I think this is
>    a shaky foundation.

If the hostname changes after boot (weird concept ..  does that really
happen?) that is irrelevant.  The hostname is copied at boot by NFS, and
if it is included in the /sys/fs/nfs/client/identifier (which would be
pointless, but not harmful) it has again been copied.

If it is different on subsequent boots, then that is a big problem and
not one that we can currently fix.

....except that non-persistent client identifiers isn't an enormous
problem, just a possible cause of delays.

> 
> 2. There's no requirement that this uniquifier be in the form
>    of a UUID anywhere in specifications, and the Linux client
>    itself does not add such a requirement. (You suggested
>    before that we should start by writing down requirements.
>    Using a UUID ain't a requirement).

The requirement here is that /etc/machine-id is documented as requiring
obfuscation.  uuidgen is a convenient way to provide obfuscation.  That
is all I was trying to say.

> 
>    Linux chooses to implement its uniquifer with a UUID because
>    it is assumed we are using a random UUID (rather than a
>    name-based or time-based UUID). A random UUID has strong
>    global uniqueness guarantees, which guarantees the client
>    identifier will always be unique amongst clients in nearly
>    all situations for nearly no cost.
> 

"Linux chooses" what does that mean?  I've lost the thread here, sorry.

> If we want to create a good uniquifier here, then combine the
> hostname, netns identity, and/or the host's machine-id and then
> hash that blob with a known strong digest algorithm like
> SHA-256. A man page must not recommend the use of deprecated or
> insecure obfuscation mechanisms.

I didn't realize the hash that uuidgen uses was deprecated.  Is there
some better way to provide an app-specific obfuscation of a string from
the command line?

Maybe
    echo nfs-id:`cat /etc/machine-id`| sha256sum

??

> 
> The man page can suggest a random-based UUID as long as it
> states plainly that such UUIDs have global uniqueness guarantees
> that make them suitable for this purpose. We're using a UUID
> for its global uniqueness properties, not because of its
> appearance.

So I could use "/etc/nfsv4-identity" instead of "/etc/nfs4-uuid".
What else should I change/add.

> 
> 
> >  For example:
> > +.RS 4
> > +echo "ip-netns:`ip netns identify`" \\
> > +.br
> > +   > /sys/fs/nfs/client/net/identifier 
> > +.br
> > +uuidgen --sha1 --namespace @url  \\
> > +.br
> > +   -N "nfs:`cat /etc/machine-id`" \\
> > +.br
> > +   > /sys/fs/nfs/client/net/identifier 
> > +.RE
> > +If the container system provides no stable name,
> > +but does have stable storage,
> 
> Here's the first mention of "stable". It needs some
> introduction far above.

True.  So the first para becomes:

  NFSv4 requires that the client present a stable unique identifier to
  the server to be used to track state such as file locks.  By default
  Linux NFS uses the hostname, as configured at the time of the first
  NFS mount, together with some fixed content such as the name "Linux
  NFS" and the particular protocol version.  When the hostname is
  guaranteed to be unique among all client which access the same server,
  and stable across reboots, this is sufficient.  If hostname uniqueness
  cannot be assumed, extra identity information must be provided.  If
  the hostname is not stable, unclean restarts may suffer unavoidable
  delays.

> 
> 
> > then something like
> > +.RS 4
> > +[ -s /etc/nfsv4-uuid ] || uuidgen > /etc/nfsv4-uuid && 
> > +.br
> > +cat /etc/nfsv4-uuid > /sys/fs/nfs/client/net/identifier 
> > +.RE
> > +would suffice.
> > +.PP
> > +If a container has neither a stable name nor stable (local) storage,
> > +then it is not possible to provide a stable identifier, so providing
> > +a random identifier to ensure uniqueness would be best
> > +.RS 4
> > +uuidgen > /sys/fs/nfs/client/net/identifier
> > +.RE
> > +.RE
> > +.SS Consequences of poor identity setting
> 
> This section provides context to understand the above technical
> recommendations. I suggest this whole section should be moved
> to near the opening paragraph.

I seem to keep moving things upwards....  something has to come last.
Maybe a "(See below)" at the end of the revised first para?

> 
> 
> > +Any two concurrent clients that might access the same server must have
> > +different identifiers for correct operation, and any two consecutive
> > +instances of the same client should have the same identifier for optimal
> > +crash recovery.
> 
> Also recovery from network partitions.

A network partition doesn't coincide with two consecutive instances of the
same client.  There is just one client instance and one server instance.

> 
> 
> > +.PP
> > +If two different clients present the same identity to a server there are
> > +two possible scenarios.  If the clients use the same credential then the
> > +server will treat them as the same client which appears to be restarting
> > +frequently.  One client may manage to open some files etc, but as soon
> > +as the other client does anything the first client will lose access and
> > +need to re-open everything.
> 
> This seems fuzzy.
> 
> 1. If locks are lost, then there is a substantial risk of data
>    corruption.
> 
> 2. Is the client itself supposed to re-open files, or are
>    applications somehow notified that they need to re-open?
>    Either of these scenarios is fraught -- I don't believe any
>    application is coded to expect to have to re-open a file
>    due to exigent circumstances.

I wasn't very happy with the description either.  I think we want some
detail, but not too much.

The "re-opening" that I mentioned is the NFS client resubmitting NFS
OPEN requests, not the application having to re-open.
However if the application manages to get a lock, then when the "other"
client connects to the server the application will lose the lock, and
all read/write accesses on the relevant fd will result in EIO (I
think).  Clearly bad.

I wanted to say the clients could end up "fighting" with each other -
the EXCHANGE_ID from one destroys the state set up by the other - I that
seems to be too much anthropomorphism.

    If two different clients present the same identity to a server there
    are two possible scenarios.  If the clients use the same credential
    then the server will treat them as the same client which appears to
    be restarting frequently.  The clients will each enter a loop where
    they establish state with the server and then find that the state
    has been destroy by the other client and so will need to establish
    it again.

???

> 
> 
> > +.PP
> > +If the clients use different credentials, then the second client to
> > +establish a connection to the server will be refused access.  For 
> > +.B auth=sys
> > +the credential is based on hostname, so will be the same if the
> > +identities are the same.  With
> > +.B auth=krb
> > +the credential is stored in 
> > +.I /etc/krb5.keytab
> > +and will be the same only if this is copied among hosts.
> 
> This language implies that copying the keytab is a recommended thing
> to do. It's not. I mentioned it before because some customers think
> it's OK to use the same keytab across their client fleet. But obviously
> that will result in lost open and lock state. 
> 
> I suggest rephrasing this last sentence to describe the negative lease
> recovery consequence of two clients happening to share the same host
> principal -- as in "This is why you shouldn't share keytabs..."
> 

How about

.PP
If the clients use different credentials, then the second client to
establish a connection to the server will be refused access which is a
safer failure mode.  For
.B auth=sys
the credential is based on hostname, so will be the same if the
identities are the same.  With
.B auth=krb
the credential is stored in 
.I /etc/krb5.keytab
so providing this isn't copied among client the safer failure mode will result.

??

Thanks for your details review!

NeilBrown

> 
> > +.PP
> > +If the identity is unique but not stable, for example if it is generated
> > +randomly on each start up of the NFS client, then crash recovery is
> > +affected.  When a client shuts down uncleanly and restarts, the server
> > +will normally detect this because the same identity is presented with
> > +different boot time (or "incarnation verifier"), and will discard old
> > +state.  If the client presents a different identifier, then the server
> > +cannot discard old state until the lease time has expired, and the new
> > +client may be delayed in opening or locking files that it was
> > +previously accessing.
> > .SH FILES
> > .TP 1.5i
> > .I /etc/fstab
> > -- 
> > 2.35.1
> > 
> 
> --
> Chuck Lever
> 
> 
> 
>