On Sat, 26 Mar 2022, Chuck Lever III wrote: > Hi Neil- > > > On Mar 24, 2022, at 8:24 PM, NeilBrown <neilb@xxxxxxx> wrote: > > > > > > [[ This implements an idea I had while discussing the issues around > > NFSv4 client identity. It isn't a solution, but instead aims to make > > the problem more immediately visible so that sites can "fix" their > > configuration before they get strange errors. > > I'm not convinced it is a good idea, but it seems worthy of a little > > discussion at least. > > There is a follow up patch to nfs-utils which expands this more > > towards a b-grade solution, but again if very open for discussion. > > ]] > > > > The default cl_owner_id is based on host name which may be common > > amongst different network namespaces. If two clients in different > > network namespaces with the same cl_owner_id attempt to mount from the > > same server, problem will occur as each client can "steal" the lease > > from the other. > > The immediate issue, IIUC, is that this helps only when the potentially > conflicting containers are located on the same physical host. I would > expect there are similar, but less probable, collision risks across > a population of clients. I see that as a separate issue - certainly similar but probably requiring a separate solution. I had hope to propose a (partial) solution to that the same time, but it proved challenging. I would like to automatically set nfs.nfs4_unique_id to something based on /etc/machine_id if it isn't otherwise set. - we could auto-generate /etc/modprobe.d/00-nfs-identity.conf but I suspect that would over-ride anything on the kernel command line. - we could run a tool at boot and when udev notices that the module is loaded, and set the parameter if it isn't already set, but that might happen after the first mount - we could get mount.nfs to check-and-set, but that might run in a mount namespace which sees a different /etc/machine-id - we could change the kernel to support another module parameter. nfs.nfs4_unique_id_default, and set that via /etc/modprobe.d Then the kernel uses it only if nfs4_unique_id is not set. I think this idea would be sufficiently safe if we could make it work. I can't see how to make it work without a kernel change - and I don't really like the kernel change I suggested. > > I guess I was also under the impression that NFS mount(2) can return > EADDRINUSE already, but I might be wrong about that. Maybe it could return EADDRINUSE if all privileged ports were in use ... I'd need to check that. > > In regard to the general issues around client ID, my approach has been > to increase the visibility of CLID_INUSE errors. At least on the > server, there are trace points that fire when this happens. Perhaps > the server and client could be more verbose about this situation. I found the RFC a bit unclear here, but my understanding is that CLID_INUSE will only get generated if krb5 authentication is used for EXCHANGE_ID, and the credentials for the clients are different. This is certainly a case worth handling well, but I doubt it would affect a high proportion of NFS installations. Or have I misunderstood? > > Our server might also track the last few boot verifiers for a client > ID, and if it sees them repeating, issue a warning? That becomes > onerous if there are more than a handful of clients with the same > co_ownerid, however. That's an interesting idea. There would be no guarantees, but I would probably report some errors, which would be enough to flag to the problem to the sysadmin. Thanks, NeilBrown