Re: sm notify (nlm) question

Olga Kornievskaia <aglo@xxxxxxxxx> · Wed, 22 May 2024 09:57:39 -0400

On Tue, May 14, 2024 at 6:13 PM Frank Filz <ffilzlnx@xxxxxxxxxxxxxx> wrote:
>
>
>
> > -----Original Message-----
> > From: Olga Kornievskaia [mailto:aglo@xxxxxxxxx]
> > Sent: Tuesday, May 14, 2024 2:50 PM
> > To: Frank Filz <ffilzlnx@xxxxxxxxxxxxxx>
> > Cc: Chuck Lever III <chuck.lever@xxxxxxxxxx>; Linux NFS Mailing List <linux-
> > nfs@xxxxxxxxxxxxxxx>
> > Subject: Re: sm notify (nlm) question
> >
> > On Tue, May 14, 2024 at 5:36 PM Frank Filz <ffilzlnx@xxxxxxxxxxxxxx> wrote:
> > >
> > > > > On May 14, 2024, at 2:56 PM, Olga Kornievskaia <aglo@xxxxxxxxx>
> > wrote:
> > > > >
> > > > > Hi folks,
> > > > >
> > > > > Given that not everything for NFSv3 has a specification, I post a
> > > > > question here (as it concerns linux v3 (client) implementation)
> > > > > but I ask a generic question with respect to NOTIFY sent by an NFS server.
> > > >
> > > > There is a standard:
> > > >
> > > > https://pubs.opengroup.org/onlinepubs/9629799/chap11.htm
> > > >
> > > >
> > > > > A NOTIFY message that is sent by an NFS server upon reboot has a
> > > > > monitor name and a state. This "state" is an integer and is
> > > > > modified on each server reboot. My question is: what about state
> > > > > value uniqueness? Is there somewhere some notion that this value
> > > > > has to be unique (as in say a random value).
> > > > >
> > > > > Here's a problem. Say a client has 2 mounts to ip1 and ip2 (both
> > > > > representing the same DNS name) and acquires a lock per mount. Now
> > > > > say each of those servers reboot. Once up they each send a NOTIFY
> > > > > call and each use a timestamp as basis for their "state" value --
> > > > > which very likely is to produce the same value for 2 servers
> > > > > rebooted at the same time (or for the linux server that looks like
> > > > > a counter). On the client side, once the client processes the 1st
> > > > > NOTIFY call, it updates the "state" for the monitor name (ie a
> > > > > client monitors based on a DNS name which is the same for ip1 and
> > > > > ip2) and then in the current code, because the 2nd NOTIFY has the
> > > > > same "state" value this NOTIFY call would be ignored. The linux
> > > > > client would never reclaim the 2nd lock (but the application
> > > > > obviously would never know it's missing a lock)
> > > > > --- data corruption.
> > > > >
> > > > > Who is to blame: is the server not allowed to send "non-unique"
> > > > > state value? Or is the client at fault here for some reason?
> > > >
> > > > The state value is supposed to be specific to the monitored host. If
> > > > the client is indeed ignoring the second reboot notification, that's incorrect
> > behavior, IMO.
> > >
> > > If you are using multiple server IP addresses with the same DNS name, you
> > may want to set:
> > >
> > > sysctl fs.nfs.nsm_use_hostnames=0
> > >
> > > The NLM will register with statd using the IP address as name instead of host
> > name. Then your two IP addresses will each have a separate monitor entry and
> > state value monitored.
> >
> > In my setup I already have this set to 0. But I'll look around the code to see what
> > it is supposed to do.
>
> Hmm, maybe it doesn't work on the client side. I don't often test NLM clients with my Ganesha work because I only run one VM and NLM clients can’t function on the same host as any server other than knfsd...

I've been staring and tracing the code and here's what I conclude: the
use of nsm_use_hostname toggles nothing that helps. No matter what
statd always stores whatever it is monitoring based on the DSN name
(looks like git blame says it's due to nfs-utils's commit
0da56f7d359475837008ea4b8d3764fe982ef512 "statd - use dnsname to
ensure correct matching of NOTIFY requests". Now what's worse is that
when statd receives a 2nd monitoring request from lockd for something
that maps to the same DNS name, statd overwrites the previous
monitoring information it had. When a NOTIFY arrives from an IP
matching the DNS name, the statd does the downcall and it will send
whatever the last monitoring information lockd gave it. Therefore all
the other locks will never be recovered.

What I struggle with is how to solve this problem. Say ip1 and ip2 run
an NFS server and both are known under the same DNS name: foo.bar.com.
Does it mean that they represent the "same" server? Can we assume that
if one of them "rebooted" then the other rebooted as well?  It seems
like we can't go backwards and go back to monitoring by IP. In that
case I can see that we'll get in trouble if the rebooted server indeed
comes back up with a different IP (same DNS name) and then it would
never match the old entry and the lock would never be recovered (but
then also I think lockd will only send the lock to the IP is stored
previously which in this case would be unreachable). If statd
continues to monitor by DNS name and then matches either ips to the
stored entry, then the problem comes with "state" update. Once statd
processes one NOTIFY which matched the DNS name its state "should" be
updated but then it would leads us back into the problem if ignoring
the 2nd NOTIFY call. If statd were to be changed to store multiple
monitor handles lockd asked to monitor, then when the 1st NOTIFY call
comes we can ask lockd to recover "all" the store handles. But then it
circles back to my question: can we assume that if one IP rebooted
does it imply all IPs rebooted?

Perhaps it's lockd that needs to change in how it keeps track of
servers that hold locks. The behaviour seems to have changed in 2010
(with commit 8ea6ecc8b0759756a766c05dc7c98c51ec90de37 "lockd: Create
client-side nlm_host cache") when nlm_host cache was introduced
written to be based on hash of IP. It seems that before things were
based on a DNS name making it in line with statd.

Anybody has any thoughts as to whether statd or lockd needs to change?