On Wed, 2024-05-22 at 09:57 -0400, Olga Kornievskaia wrote: > On Tue, May 14, 2024 at 6:13 PM Frank Filz <ffilzlnx@xxxxxxxxxxxxxx> > wrote: > > > > > > > > > -----Original Message----- > > > From: Olga Kornievskaia [mailto:aglo@xxxxxxxxx] > > > Sent: Tuesday, May 14, 2024 2:50 PM > > > To: Frank Filz <ffilzlnx@xxxxxxxxxxxxxx> > > > Cc: Chuck Lever III <chuck.lever@xxxxxxxxxx>; Linux NFS Mailing > > > List <linux- > > > nfs@xxxxxxxxxxxxxxx> > > > Subject: Re: sm notify (nlm) question > > > > > > On Tue, May 14, 2024 at 5:36 PM Frank Filz > > > <ffilzlnx@xxxxxxxxxxxxxx> wrote: > > > > > > > > > > On May 14, 2024, at 2:56 PM, Olga Kornievskaia > > > > > > <aglo@xxxxxxxxx> > > > wrote: > > > > > > > > > > > > Hi folks, > > > > > > > > > > > > Given that not everything for NFSv3 has a specification, I > > > > > > post a > > > > > > question here (as it concerns linux v3 (client) > > > > > > implementation) > > > > > > but I ask a generic question with respect to NOTIFY sent by > > > > > > an NFS server. > > > > > > > > > > There is a standard: > > > > > > > > > > https://pubs.opengroup.org/onlinepubs/9629799/chap11.htm > > > > > > > > > > > > > > > > A NOTIFY message that is sent by an NFS server upon reboot > > > > > > has a > > > > > > monitor name and a state. This "state" is an integer and is > > > > > > modified on each server reboot. My question is: what about > > > > > > state > > > > > > value uniqueness? Is there somewhere some notion that this > > > > > > value > > > > > > has to be unique (as in say a random value). > > > > > > > > > > > > Here's a problem. Say a client has 2 mounts to ip1 and ip2 > > > > > > (both > > > > > > representing the same DNS name) and acquires a lock per > > > > > > mount. Now > > > > > > say each of those servers reboot. Once up they each send a > > > > > > NOTIFY > > > > > > call and each use a timestamp as basis for their "state" > > > > > > value -- > > > > > > which very likely is to produce the same value for 2 > > > > > > servers > > > > > > rebooted at the same time (or for the linux server that > > > > > > looks like > > > > > > a counter). On the client side, once the client processes > > > > > > the 1st > > > > > > NOTIFY call, it updates the "state" for the monitor name > > > > > > (ie a > > > > > > client monitors based on a DNS name which is the same for > > > > > > ip1 and > > > > > > ip2) and then in the current code, because the 2nd NOTIFY > > > > > > has the > > > > > > same "state" value this NOTIFY call would be ignored. The > > > > > > linux > > > > > > client would never reclaim the 2nd lock (but the > > > > > > application > > > > > > obviously would never know it's missing a lock) > > > > > > --- data corruption. > > > > > > > > > > > > Who is to blame: is the server not allowed to send "non- > > > > > > unique" > > > > > > state value? Or is the client at fault here for some > > > > > > reason? > > > > > > > > > > The state value is supposed to be specific to the monitored > > > > > host. If > > > > > the client is indeed ignoring the second reboot notification, > > > > > that's incorrect > > > behavior, IMO. > > > > > > > > If you are using multiple server IP addresses with the same DNS > > > > name, you > > > may want to set: > > > > > > > > sysctl fs.nfs.nsm_use_hostnames=0 > > > > > > > > The NLM will register with statd using the IP address as name > > > > instead of host > > > name. Then your two IP addresses will each have a separate > > > monitor entry and > > > state value monitored. > > > > > > In my setup I already have this set to 0. But I'll look around > > > the code to see what > > > it is supposed to do. > > > > Hmm, maybe it doesn't work on the client side. I don't often test > > NLM clients with my Ganesha work because I only run one VM and NLM > > clients can’t function on the same host as any server other than > > knfsd... > > I've been staring and tracing the code and here's what I conclude: > the > use of nsm_use_hostname toggles nothing that helps. No matter what > statd always stores whatever it is monitoring based on the DSN name > (looks like git blame says it's due to nfs-utils's commit > 0da56f7d359475837008ea4b8d3764fe982ef512 "statd - use dnsname to > ensure correct matching of NOTIFY requests". Now what's worse is that > when statd receives a 2nd monitoring request from lockd for something > that maps to the same DNS name, statd overwrites the previous > monitoring information it had. When a NOTIFY arrives from an IP > matching the DNS name, the statd does the downcall and it will send > whatever the last monitoring information lockd gave it. Therefore all > the other locks will never be recovered. > > What I struggle with is how to solve this problem. Say ip1 and ip2 > run > an NFS server and both are known under the same DNS name: > foo.bar.com. > Does it mean that they represent the "same" server? Can we assume > that > if one of them "rebooted" then the other rebooted as well? It seems > like we can't go backwards and go back to monitoring by IP. In that > case I can see that we'll get in trouble if the rebooted server > indeed > comes back up with a different IP (same DNS name) and then it would > never match the old entry and the lock would never be recovered (but > then also I think lockd will only send the lock to the IP is stored > previously which in this case would be unreachable). If statd > continues to monitor by DNS name and then matches either ips to the > stored entry, then the problem comes with "state" update. Once statd > processes one NOTIFY which matched the DNS name its state "should" be > updated but then it would leads us back into the problem if ignoring > the 2nd NOTIFY call. If statd were to be changed to store multiple > monitor handles lockd asked to monitor, then when the 1st NOTIFY call > comes we can ask lockd to recover "all" the store handles. But then > it > circles back to my question: can we assume that if one IP rebooted > does it imply all IPs rebooted? > > Perhaps it's lockd that needs to change in how it keeps track of > servers that hold locks. The behaviour seems to have changed in 2010 > (with commit 8ea6ecc8b0759756a766c05dc7c98c51ec90de37 "lockd: Create > client-side nlm_host cache") when nlm_host cache was introduced > written to be based on hash of IP. It seems that before things were > based on a DNS name making it in line with statd. > > Anybody has any thoughts as to whether statd or lockd needs to > change? > I believe Tom Talpey is to blame for the nsm_use_hostname stuff. That all came from his 2006 Connectathon talk https://nfsv4bat.org/Documents/ConnectAThon/2006/talpey-cthon06-nsm.pdf -- Trond Myklebust Linux NFS client maintainer, Hammerspace trond.myklebust@xxxxxxxxxxxxxxx