> On Mar 2, 2022, at 10:26 PM, NeilBrown <neilb@xxxxxxx> wrote: > > On Wed, 02 Mar 2022, Chuck Lever III wrote: >> >>> On Feb 28, 2022, at 10:43 PM, NeilBrown <neilb@xxxxxxx> wrote: >>> >>> >>> When mounting NFS filesystems in a network namespace using v4, some care >>> must be taken to ensure a unique and stable client identity. >>> Add documentation explaining the requirements for container managers. >>> >>> Signed-off-by: NeilBrown <neilb@xxxxxxx> >>> --- >>> >>> NOTE I originally suggested using uuidgen to generate a uuid from a >>> container name. I've changed it to use the name as-is because I cannot >>> see a justification for using a uuid - though I think that was suggested >>> somewhere in the discussion. >>> If someone would like to provide that justification, I'm happy to >>> include it in the document. >>> >>> Thanks, >>> NeilBrown >>> >>> >>> utils/mount/nfs.man | 63 +++++++++++++++++++++++++++++++++++++++++++++ >>> 1 file changed, 63 insertions(+) >>> >>> diff --git a/utils/mount/nfs.man b/utils/mount/nfs.man >>> index d9f34df36b42..4ab76fb2df91 100644 >>> --- a/utils/mount/nfs.man >>> +++ b/utils/mount/nfs.man >>> @@ -1844,6 +1844,69 @@ export pathname, but not both, during a remount. For example, >>> merges the mount option >>> .B ro >>> with the mount options already saved on disk for the NFS server mounted at /mnt. >>> +.SH "NFS IN A CONTAINER" >> >> To be clear, this explanation is about the operation of the >> Linux NFS client in a container environment. The server has >> different needs that do not appear to be addressed here. >> The section title should be clear that this information >> pertains to the client. > > The whole man page is only about the client, but I agree that clarity is > best. I've changed the section heading to > > NFS MOUNTS IN A CONTAINER Actually I've rethought this. I think the central point of this text needs to be how the client uniquifier works. It needs to work this way for all client deployments, whether containerized or not. There are some important variations that can be called out: 1. Containers (or virtualized clients) When multiple NFS clients run on the same physical host. 2. NAT NAT hasn't been mentioned before, but it is a common deployment scenario where multiple clients can have the same hostname and local IP address (a private address such as 192.168.0.55) but the clients all access the same NFS server. 3. NFSROOT Where the uniquifier has to be provided on the boot command line and can't be persisted locally on the client. >>> +When NFS is used to mount filesystems in a container, and specifically >>> +in a separate network name-space, these mounts are treated as quite >>> +separate from any mounts in a different container or not in a >>> +container (i.e. in a different network name-space). >> >> It might be helpful to provide an introductory explanation of >> how mount works in general in a namespaced environment. There >> might already be one somewhere. The above text needs to be >> clear that we are not discussing the mount namespace. > > Mount namespaces are completely irrelevant for this discussion. Agreed, mount namespaces are irrelevant to this discussion. > This is "specifically" about "network name-spaces" a I wrote. > Do I need to say more than that? > Maybe a sentence "Mount namespaces are not relevant" ?? I would say by way of introduction that "An NFS mount, unlike a local filesystem mount, exists in both a mount namespace and a network namespace", then continue with "this is specifically about network namespaces." >>> +.P >>> +In the NFSv4 protocol, each client must have a unique identifier. >> >> ... each client must have a persistent and globally unique >> identifier. > > I dispute "globally". The id only needs to be unique among clients of > a given NFS server. Practically speaking, that is correct in a limited sense. However there is no limit on the use of a laptop (ie, a physically portable client) to access any NFS server that is local to it. We have no control over how clients are physically deployed. A public NFS server is going to see a vast cohort of clients, all of which need to have unique identifiers. There's no interaction amongst the clients themselves to determine whether there are identifier collisions. Global uniqueness therefore is a requirement to make that work seamlessly. > I also dispute "persistent" in the context of "must". > Unless I'm missing something, a lack of persistence only matters when a > client stops while still holding state, and then restarts within the > lease period. It will then be prevented from establishing conflicting > state until the lease period ends. The client's identifier needs to be persistent so that: 1. If the server reboots, it can recognize when clients are re-establishing their lock and open state versus an unfamiliar creating lock and open state that might involve files that an existing client has open. 2. If the client reboots, the server is able to tie the rebooted client to an existing lease so that the lease and all of the client's previous lock and open state are properly purged. There are moments when a client's identifier can change without consequences. It's not entirely relevant to the discussion to go into detail about when those moments occur. > So persistence is good, but is not a > hard requirement. Uniqueness IS a hard requirement among concurrent > clients of the one server. OK, then you were using the colloquial meaning of "must" and "should", not the RFC 2119 meanings. Capitalizing them was very confusing. Happily you provided a good replacement below. >>> +This is used by the server to determine when a client has restarted, >>> +allowing any state from a previous instance can be discarded. >> >> Lots of passive voice here :-) >> >> The server associates a lease with the client's identifier >> and a boot instance verifier. The server attaches all of >> the client's file open and lock state to that lease, which >> it preserves until the client's boot verifier changes. > > I guess I"m a passivist. If we are going for that level of detail we > need to mention lease expiry too. > > .... it preserves until the lease time passes without any renewal from > the client, or the client's boot verifier changes. This is not entirely true. A server is not required to dispense with a client's lease state when the lease period is up. The Linux server does that today, but soon it won't, instead waiting until a conflicting open or lock request before it purges the lease of an unreachable client. The requirement is actually the converse: the server must preserve a client's open and lock state during the lease period. Outside of the lease period, behavior is an implementation choice. > In another email you add: > >> Oh and also, this might be a good opportunity to explain >> how the server requires that the client use not only the >> same identifier string, but also the same principal to >> reattach itself to its open and lock state after a server >> reboot. >> >> This is why the Linux NFS client attempts to use Kerberos >> whenever it can for this purpose. Using AUTH_SYS invites >> other another client that happens to have the same identifier >> to trigger the server to purge that client's open and lock >> state. > > How relevant is this to the context of a container? It's relevant because the client's identity consists of the nfs_client_id4 string and the principal and authentication flavor used to establish the lease. If a container is manufactured by duplicating a template that contains a keytab (and yes, I've seen this done in practice) the principal and flavor will be the same in the duplicated container, and that will be a problem. If the client is using only AUTH_SYS, as I mention above, then the only distinction is the nfs_client_id4 string itself (since clients typically use UID 0 as the principal in this case). There is really no protection here -- and admins need to be warned about this because their users will see open and lock state disappearing for no reason because some clients happen to choose the same nfs_client_id4 string and are purging each others' lease. > How much extra context would be need to add to make the mention of > credentials coherent? > Maybe we should add another section about credentials, and add it just > before this one?? See above. The central discussion needs to be about client identity IMO. >>> So any two >>> +concurrent clients that might access the same server MUST have >>> +different identifiers, and any two consecutive instances of the same >>> +client SHOULD have the same identifier. >> >> Capitalized MUST and SHOULD have specific meanings in IETF >> standards that are probably not obvious to average readers >> of man pages. To average readers, this looks like shouting. >> Can you use something a little friendlier? >> > > How about: > > Any two concurrent clients that might access the same server must > have different identifiers for correct operation, and any two > consecutive instances of the same client should have the same > identifier for optimal handling of an unclean restart. Nice. >>> +.P >>> +Linux constructs the identifier (referred to as >>> +.B co_ownerid >>> +in the NFS specifications) from various pieces of information, three of >>> +which can be controlled by the sysadmin: >>> +.TP >>> +Hostname >>> +The hostname can be different in different containers if they >>> +have different "UTS" name-spaces. If the container system ensures >>> +each container sees a unique host name, >> >> Actually, it turns out that is a pretty big "if". We've >> found that our cloud customers are not careful about >> setting unique hostnames. That's exactly why the whole >> uniquifier thing is so critical! > > :-) I guess we keep it as "if" though, not "IF" .... And as mentioned above, it's not possible for them to select hostnames and IP addresses (in particular in the private IP address range) that are guaranteed to be unique enough for a given server. The choices are completely uncoordinated and have a considerable risk of collision. >>> then this is >>> +sufficient for a correctly functioning NFS identifier. >>> +The host name is copied when the first NFS filesystem is mounted in >>> +a given network name-space. Any subsequent change in the apparent >>> +hostname will not change the NFSv4 identifier. >> >> The purpose of using a uuid here is that, given its >> definition in RFC 4122, it has very strong global >> uniqueness guarantees. > > A uuid generated from a given string (uuidgen -N $name ...) has the same > uniqueness as the $name. Turning it into a uuid doesn't improve the > uniqueness. It just provides a standard format and obfuscates the > original. Neither of those seem necessary here. If indeed that's what's going on, then that's the wrong approach. We need to have a globally unique identifier here. If hashing a hostname has the risk that the digest will be the same for two clients, then that version of UUID is not usable for our purpose. The non-globally unique versions of UUID are hardly used any more because folks who use UUIDs generally need a guarantee of global uniqueness without a central coordinating authority. Time-based and randomly generated UUIDs are typically the only style that are used any more. > I think Ben is considering using /etc/mechine-id. Creating a uuid from > that does make it any better. I assume you mean "does /not/ make it any better". As long as the machine-id is truly random and is not, say, a hash of the hostname, then it should work fine. The only downside of machine-id is the man page's stipulation that the machine-id shouldn't be publicly exposed on the network, which is why it ought be at least hashed before it is used as part of an nfs_client_id4. So I guess there's a third requirement, aside from persistence and global uniqueness: Information about the sender (client in this case) is not inadvertently leaked onto the open network. >> Using a UUID makes hostname uniqueness irrelevant. > > Only if the UUID is created appropriately. If, for example, it is > created with -N from some name that is unique on the host, then it needs > to be combined with the hostname to get sufficient uniqueness. Then that's the wrong version of UUID to use. >> Again, I think our goal should be hiding all of this >> detail from administrators, because once we get this >> mechanism working correctly, there is absolutely no >> need for administrators to bother with it. > > Except when things break. Then admins will appreciate having the > details so they can track down the breakage. My desktop didn't boot > this morning. Systemd didn't tell me why it was hanging though I > eventually discovered that it was "wicked.service" that wasn't reporting > success. So I'm currently very focused on the need to provide clarity > to sysadmins, even of "irrelevant" details. > > But this documentation isn't just for sysadmins, it is for container > developers too, so they can find out how to make their container work > with NFS. An alternative location for this detail would be under Documentation/. A man page is possibly not the right venue for a detailed explanation of protocol and implementation; man pages usually are limited to quick summaries of interfaces. >> The remaining part of this text probably should be >> part of the man page for Ben's tool, or whatever is >> coming next. > > My position is that there is no need for any tool. Trond's earlier point about having to repeat this functionality for other ways of mounting NFS (eg Busybox) suggests we have to have a separate tool, even though this is only a handful of lines of code. > The total amount of > code needed is a couple of lines as presented in the text below. Why > provide a wrapper just for that? > We *cannot* automatically decide how to find a name or where to store a > generated uuid, so there is no added value that a tool could provide. I don't think anyone has yet demonstrated (or even stated) this is impossible. Can you explain why you believe this? > We cannot unilaterally fix container systems. We can only tell people > who build these systems of the requirements for NFS. -- Chuck Lever