Re: sm notify (nlm) question

Trond Myklebust <trondmy@xxxxxxxxxxxxxxx> · Wed, 22 May 2024 16:20:18 +0000

On Wed, 2024-05-22 at 09:57 -0400, Olga Kornievskaia wrote:
> On Tue, May 14, 2024 at 6:13 PM Frank Filz <ffilzlnx@xxxxxxxxxxxxxx>
> wrote:
> > 
> > 
> > 
> > > -----Original Message-----
> > > From: Olga Kornievskaia [mailto:aglo@xxxxxxxxx]
> > > Sent: Tuesday, May 14, 2024 2:50 PM
> > > To: Frank Filz <ffilzlnx@xxxxxxxxxxxxxx>
> > > Cc: Chuck Lever III <chuck.lever@xxxxxxxxxx>; Linux NFS Mailing
> > > List <linux-
> > > nfs@xxxxxxxxxxxxxxx>
> > > Subject: Re: sm notify (nlm) question
> > > 
> > > On Tue, May 14, 2024 at 5:36 PM Frank Filz
> > > <ffilzlnx@xxxxxxxxxxxxxx> wrote:
> > > > 
> > > > > > On May 14, 2024, at 2:56 PM, Olga Kornievskaia
> > > > > > <aglo@xxxxxxxxx>
> > > wrote:
> > > > > > 
> > > > > > Hi folks,
> > > > > > 
> > > > > > Given that not everything for NFSv3 has a specification, I
> > > > > > post a
> > > > > > question here (as it concerns linux v3 (client)
> > > > > > implementation)
> > > > > > but I ask a generic question with respect to NOTIFY sent by
> > > > > > an NFS server.
> > > > > 
> > > > > There is a standard:
> > > > > 
> > > > > https://pubs.opengroup.org/onlinepubs/9629799/chap11.htm
> > > > > 
> > > > > 
> > > > > > A NOTIFY message that is sent by an NFS server upon reboot
> > > > > > has a
> > > > > > monitor name and a state. This "state" is an integer and is
> > > > > > modified on each server reboot. My question is: what about
> > > > > > state
> > > > > > value uniqueness? Is there somewhere some notion that this
> > > > > > value
> > > > > > has to be unique (as in say a random value).
> > > > > > 
> > > > > > Here's a problem. Say a client has 2 mounts to ip1 and ip2
> > > > > > (both
> > > > > > representing the same DNS name) and acquires a lock per
> > > > > > mount. Now
> > > > > > say each of those servers reboot. Once up they each send a
> > > > > > NOTIFY
> > > > > > call and each use a timestamp as basis for their "state"
> > > > > > value --
> > > > > > which very likely is to produce the same value for 2
> > > > > > servers
> > > > > > rebooted at the same time (or for the linux server that
> > > > > > looks like
> > > > > > a counter). On the client side, once the client processes
> > > > > > the 1st
> > > > > > NOTIFY call, it updates the "state" for the monitor name
> > > > > > (ie a
> > > > > > client monitors based on a DNS name which is the same for
> > > > > > ip1 and
> > > > > > ip2) and then in the current code, because the 2nd NOTIFY
> > > > > > has the
> > > > > > same "state" value this NOTIFY call would be ignored. The
> > > > > > linux
> > > > > > client would never reclaim the 2nd lock (but the
> > > > > > application
> > > > > > obviously would never know it's missing a lock)
> > > > > > --- data corruption.
> > > > > > 
> > > > > > Who is to blame: is the server not allowed to send "non-
> > > > > > unique"
> > > > > > state value? Or is the client at fault here for some
> > > > > > reason?
> > > > > 
> > > > > The state value is supposed to be specific to the monitored
> > > > > host. If
> > > > > the client is indeed ignoring the second reboot notification,
> > > > > that's incorrect
> > > behavior, IMO.
> > > > 
> > > > If you are using multiple server IP addresses with the same DNS
> > > > name, you
> > > may want to set:
> > > > 
> > > > sysctl fs.nfs.nsm_use_hostnames=0
> > > > 
> > > > The NLM will register with statd using the IP address as name
> > > > instead of host
> > > name. Then your two IP addresses will each have a separate
> > > monitor entry and
> > > state value monitored.
> > > 
> > > In my setup I already have this set to 0. But I'll look around
> > > the code to see what
> > > it is supposed to do.
> > 
> > Hmm, maybe it doesn't work on the client side. I don't often test
> > NLM clients with my Ganesha work because I only run one VM and NLM
> > clients can’t function on the same host as any server other than
> > knfsd...
> 
> I've been staring and tracing the code and here's what I conclude:
> the
> use of nsm_use_hostname toggles nothing that helps. No matter what
> statd always stores whatever it is monitoring based on the DSN name
> (looks like git blame says it's due to nfs-utils's commit
> 0da56f7d359475837008ea4b8d3764fe982ef512 "statd - use dnsname to
> ensure correct matching of NOTIFY requests". Now what's worse is that
> when statd receives a 2nd monitoring request from lockd for something
> that maps to the same DNS name, statd overwrites the previous
> monitoring information it had. When a NOTIFY arrives from an IP
> matching the DNS name, the statd does the downcall and it will send
> whatever the last monitoring information lockd gave it. Therefore all
> the other locks will never be recovered.
> 
> What I struggle with is how to solve this problem. Say ip1 and ip2
> run
> an NFS server and both are known under the same DNS name:
> foo.bar.com.
> Does it mean that they represent the "same" server? Can we assume
> that
> if one of them "rebooted" then the other rebooted as well?  It seems
> like we can't go backwards and go back to monitoring by IP. In that
> case I can see that we'll get in trouble if the rebooted server
> indeed
> comes back up with a different IP (same DNS name) and then it would
> never match the old entry and the lock would never be recovered (but
> then also I think lockd will only send the lock to the IP is stored
> previously which in this case would be unreachable). If statd
> continues to monitor by DNS name and then matches either ips to the
> stored entry, then the problem comes with "state" update. Once statd
> processes one NOTIFY which matched the DNS name its state "should" be
> updated but then it would leads us back into the problem if ignoring
> the 2nd NOTIFY call. If statd were to be changed to store multiple
> monitor handles lockd asked to monitor, then when the 1st NOTIFY call
> comes we can ask lockd to recover "all" the store handles. But then
> it
> circles back to my question: can we assume that if one IP rebooted
> does it imply all IPs rebooted?
> 
> Perhaps it's lockd that needs to change in how it keeps track of
> servers that hold locks. The behaviour seems to have changed in 2010
> (with commit 8ea6ecc8b0759756a766c05dc7c98c51ec90de37 "lockd: Create
> client-side nlm_host cache") when nlm_host cache was introduced
> written to be based on hash of IP. It seems that before things were
> based on a DNS name making it in line with statd.
> 
> Anybody has any thoughts as to whether statd or lockd needs to
> change?
> 

I believe Tom Talpey is to blame for the nsm_use_hostname stuff. That
all came from his 2006 Connectathon talk
https://nfsv4bat.org/Documents/ConnectAThon/2006/talpey-cthon06-nsm.pdf

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@xxxxxxxxxxxxxxx