Re: sm notify (nlm) question

Tom Talpey <tom@xxxxxxxxxx> · Wed, 22 May 2024 13:18:55 -0400

On 5/22/2024 12:20 PM, Trond Myklebust wrote:
On Wed, 2024-05-22 at 09:57 -0400, Olga Kornievskaia wrote:
On Tue, May 14, 2024 at 6:13 PM Frank Filz <ffilzlnx@xxxxxxxxxxxxxx>
wrote:

-----Original Message-----
From: Olga Kornievskaia [mailto:aglo@xxxxxxxxx]
Sent: Tuesday, May 14, 2024 2:50 PM
To: Frank Filz <ffilzlnx@xxxxxxxxxxxxxx>
Cc: Chuck Lever III <chuck.lever@xxxxxxxxxx>; Linux NFS Mailing
List <linux-
nfs@xxxxxxxxxxxxxxx>
Subject: Re: sm notify (nlm) question

On Tue, May 14, 2024 at 5:36 PM Frank Filz
<ffilzlnx@xxxxxxxxxxxxxx> wrote:

On May 14, 2024, at 2:56 PM, Olga Kornievskaia
<aglo@xxxxxxxxx>
wrote:

Hi folks,

Given that not everything for NFSv3 has a specification, I
post a
question here (as it concerns linux v3 (client)
implementation)
but I ask a generic question with respect to NOTIFY sent by
an NFS server.

There is a standard:

https://pubs.opengroup.org/onlinepubs/9629799/chap11.htm

A NOTIFY message that is sent by an NFS server upon reboot
has a
monitor name and a state. This "state" is an integer and is
modified on each server reboot. My question is: what about
state
value uniqueness? Is there somewhere some notion that this
value
has to be unique (as in say a random value).

Here's a problem. Say a client has 2 mounts to ip1 and ip2
(both
representing the same DNS name) and acquires a lock per
mount. Now
say each of those servers reboot. Once up they each send a
NOTIFY
call and each use a timestamp as basis for their "state"
value --
which very likely is to produce the same value for 2
servers
rebooted at the same time (or for the linux server that
looks like
a counter). On the client side, once the client processes
the 1st
NOTIFY call, it updates the "state" for the monitor name
(ie a
client monitors based on a DNS name which is the same for
ip1 and
ip2) and then in the current code, because the 2nd NOTIFY
has the
same "state" value this NOTIFY call would be ignored. The
linux
client would never reclaim the 2nd lock (but the
application
obviously would never know it's missing a lock)
--- data corruption.

Who is to blame: is the server not allowed to send "non-
unique"
state value? Or is the client at fault here for some
reason?

The state value is supposed to be specific to the monitored
host. If
the client is indeed ignoring the second reboot notification,
that's incorrect
behavior, IMO.

If you are using multiple server IP addresses with the same DNS
name, you
may want to set:

sysctl fs.nfs.nsm_use_hostnames=0

The NLM will register with statd using the IP address as name
instead of host
name. Then your two IP addresses will each have a separate
monitor entry and
state value monitored.

In my setup I already have this set to 0. But I'll look around
the code to see what
it is supposed to do.

Hmm, maybe it doesn't work on the client side. I don't often test
NLM clients with my Ganesha work because I only run one VM and NLM
clients can’t function on the same host as any server other than
knfsd...

I've been staring and tracing the code and here's what I conclude:
the
use of nsm_use_hostname toggles nothing that helps. No matter what
statd always stores whatever it is monitoring based on the DSN name
(looks like git blame says it's due to nfs-utils's commit
0da56f7d359475837008ea4b8d3764fe982ef512 "statd - use dnsname to
ensure correct matching of NOTIFY requests". Now what's worse is that
when statd receives a 2nd monitoring request from lockd for something
that maps to the same DNS name, statd overwrites the previous
monitoring information it had. When a NOTIFY arrives from an IP
matching the DNS name, the statd does the downcall and it will send
whatever the last monitoring information lockd gave it. Therefore all
the other locks will never be recovered.

What I struggle with is how to solve this problem. Say ip1 and ip2
run
an NFS server and both are known under the same DNS name:
foo.bar.com.
Does it mean that they represent the "same" server? Can we assume
that
if one of them "rebooted" then the other rebooted as well?  It seems
like we can't go backwards and go back to monitoring by IP. In that
case I can see that we'll get in trouble if the rebooted server
indeed
comes back up with a different IP (same DNS name) and then it would
never match the old entry and the lock would never be recovered (but
then also I think lockd will only send the lock to the IP is stored
previously which in this case would be unreachable). If statd
continues to monitor by DNS name and then matches either ips to the
stored entry, then the problem comes with "state" update. Once statd
processes one NOTIFY which matched the DNS name its state "should" be
updated but then it would leads us back into the problem if ignoring
the 2nd NOTIFY call. If statd were to be changed to store multiple
monitor handles lockd asked to monitor, then when the 1st NOTIFY call
comes we can ask lockd to recover "all" the store handles. But then
it
circles back to my question: can we assume that if one IP rebooted
does it imply all IPs rebooted?

Perhaps it's lockd that needs to change in how it keeps track of
servers that hold locks. The behaviour seems to have changed in 2010
(with commit 8ea6ecc8b0759756a766c05dc7c98c51ec90de37 "lockd: Create
client-side nlm_host cache") when nlm_host cache was introduced
written to be based on hash of IP. It seems that before things were
based on a DNS name making it in line with statd.

Anybody has any thoughts as to whether statd or lockd needs to
change?

I believe Tom Talpey is to blame for the nsm_use_hostname stuff. That
all came from his 2006 Connectathon talk
https://nfsv4bat.org/Documents/ConnectAThon/2006/talpey-cthon06-nsm.pdf

I deny that!! :) All that talk intended to do was to point out how
deeply flawed the statmon protocol is, and how badly it was then
implemented. However, hostnames may be a slight improvement over
the mess that was 2006. And it's been kinda sorta working since then.

Personally I still think trying to "fix" nsm is a fool's errand.
It's just never ever going to succeed. Particularly if both the
clients *and* servers have to change. NFS4.1 is the better way.

Tom.