On 11/20/22 6:44 PM, NeilBrown wrote:
On Sun, 20 Nov 2022, Steve Dickson wrote:Hey! On 11/14/22 11:40 PM, NeilBrown wrote:NFSv4.1 and later require the server to report a "scope". Servers with the same scope are expected to understand each other's state ids etc, though may not accept them - this ensure there can be no misunderstanding. This is helpful for migration. Servers with different scope are known to be different and if a server appears to change scope on a restart, lock recovery must not be attempted. It is important for fail-over configurations to have the same scope for all server instances. Linux NFSD sets scope to host name. It is common for fail-over configurations to use different host names on different server nodes. So the default is not good for these configurations and must be over-ridden. As discussed in https://github.com/ClusterLabs/resource-agents/issues/1644 some HA management tools attempt to address this with calls to "unshare" and "hostname" before running "rpc.nfsd". This is unnecessarily cumbersome. This patch adds a "-S" command-line option and nfsd.scope config value so that the scope can be set easily for nfsd. Signed-off-by: NeilBrown <neilb@xxxxxxx> --- systemd/nfs.conf.man | 1 + utils/nfsd/nfsd.c | 17 ++++++++++++++++- utils/nfsd/nfsd.man | 13 ++++++++++++- 3 files changed, 29 insertions(+), 2 deletions(-) diff --git a/systemd/nfs.conf.man b/systemd/nfs.conf.man index b95c05a68759..bfd3380ff081 100644 --- a/systemd/nfs.conf.man +++ b/systemd/nfs.conf.man @@ -172,6 +172,7 @@ for details. Recognized values: .BR threads , .BR host , +.BR scope , .BR port , .BR grace-time , .BR lease-time , diff --git a/utils/nfsd/nfsd.c b/utils/nfsd/nfsd.c index 4016a761293b..249df00b448d 100644 --- a/utils/nfsd/nfsd.c +++ b/utils/nfsd/nfsd.c @@ -23,6 +23,7 @@ #include <sys/socket.h> #include <netinet/in.h> #include <arpa/inet.h> +#include <sched.h>#include "conffile.h"#include "nfslib.h" @@ -39,6 +40,7 @@ static void usage(const char *); static struct option longopts[] = { { "host", 1, 0, 'H' }, + { "scope", 1, 0, 'S'}, { "help", 0, 0, 'h' }, { "no-nfs-version", 1, 0, 'N' }, { "nfs-version", 1, 0, 'V' }, @@ -69,6 +71,7 @@ main(int argc, char **argv) int count = NFSD_NPROC, c, i, error = 0, portnum, fd, found_one; char *p, *progname, *port, *rdma_port = NULL; char **haddr = NULL; + char *scope = NULL; int hcounter = 0; struct conf_list *hosts; int socket_up = 0; @@ -168,8 +171,9 @@ main(int argc, char **argv) hcounter++; } } + scope = conf_get_str("nfsd", "scope");- while ((c = getopt_long(argc, argv, "dH:hN:V:p:P:stTuUrG:L:", longopts, NULL)) != EOF) {+ while ((c = getopt_long(argc, argv, "dH:S:hN:V:p:P:stTuUrG:L:", longopts, NULL)) != EOF) { switch(c) { case 'd': xlog_config(D_ALL, 1); @@ -190,6 +194,9 @@ main(int argc, char **argv) haddr[hcounter] = optarg; hcounter++; break; + case 'S': + scope = optarg; + break; case 'P': /* XXX for nfs-server compatibility */ case 'p': /* only the last -p option has any effect */ @@ -367,6 +374,14 @@ main(int argc, char **argv) if (lease > 0) nfssvc_set_time("lease", lease);+ if (scope) {+ if (unshare(soc) < 0 ||Where did that "soc" come from? In the email I sent this line is + if (unshare(CLONE_NEWUTS) < 0 ||
I have no idea... it must be my evil twin again!! :-)
+ sethostname(scope, strlen(scope)) < 0) { + xlog(L_ERROR, "Unable to set server scope: %m"); + error = -1; + goto out; + } + }So setting the scope resets the utsname and hostname which will effect the entire system, possibly negatively. Breaking DNS... who knows what is going to happen, when the hostname is changed on the fly.. But with that said..No, it doesn't affect the entire system. The unshare() call creates a new UTS namespace that is private to this process. The sethostname call then sets the host name in that uts namespace - still private to this process. Then when rpc.nfsd asks the kernel to start some nfsd threads the nfsd_svc() function in the kernel (since linux-5.7) does: strscpy(nn->nfsd_name, utsname()->nodename, sizeof(nn->nfsd_name)); which takes a copy of that new hostname for internal usage. Then the rpc.nfsd exits and the temporary UTS namespace is destroyed. So this is all really just a somewhat unusual way to pass a config parameter to the kernel. It is an internal implementation detail, nothing more.
Got it... thanks for the explanation.
I understand what you are trying to doing, but I just don't think it's documented well enough... Saying something like setting the scope *will* change both utsname and hostnameJust FYI: "utsname" is everything reported by 'uname -a' which includes the hostname.
Yup! steved.
to given scope or something like... I just want be more explicit as to what setting the scope is actually going to do. steved.Thanks, NeilBrowni = 0; do { error = nfssvc_set_sockets(protobits, haddr[i], port); diff --git a/utils/nfsd/nfsd.man b/utils/nfsd/nfsd.man index bb99fe2b1d89..dc05f3623465 100644 --- a/utils/nfsd/nfsd.man +++ b/utils/nfsd/nfsd.man @@ -35,9 +35,17 @@ Note that .B lockd (which performs file locking services for NFS) may still accept request on all known network addresses. This may change in future -releases of the Linux Kernel. This option can be used multiple time +releases of the Linux Kernel. This option can be used multiple times to listen to more than one interface. .TP +.B \S " or " \-\-scope scope +NFSv4.1 and later require the server to report a "scope" which is used +by the clients to detect if two connections are to the same server. +By default Linux NFSD uses the host name as the scope. +.sp +It is particularly important for high-availablity configurations to ensure +that all potential server nodes report the same server scope. +.TP .B \-p " or " \-\-port port specify a different port to listen on for NFS requests. By default, .B rpc.nfsd @@ -134,6 +142,9 @@ will listen on. Use of the .B --host option replaces all host names listed here. .TP +.B scope +Set the server scope. +.TP .B grace-time The grace time, for both NFSv4 and NLM, in seconds. .TP