On 10/1/20 4:43 PM, J. Bruce Fields wrote:
On Thu, Oct 01, 2020 at 04:05:13PM -0500, Patrick Goetz wrote:
On 10/1/20 3:06 PM, J. Bruce Fields wrote:
On Thu, Oct 01, 2020 at 01:41:39PM -0500, Patrick Goetz wrote:
Hi Bruce,
Thanks for the reply. See below.
On 10/1/20 1:30 PM, J. Bruce Fields wrote:
On Fri, Sep 25, 2020 at 09:40:16AM -0500, Patrick Goetz wrote:
My University information security office does not like rpcbind and
will automatically quarantine any system for which they detect a
portmapper running on an exposed port.
Since I exclusively use NFSv4 I was happy to "learn" that NFSv4
doesn't require rpcbind any more. For example, here's what it says
in the current RHEL documentation:
"NFS version 4 (NFSv4) works through firewalls and on the Internet,
no longer requires an rpcbind service, supports Access Control Lists
(ACLs), and utilizes stateful operations."
https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/managing_file_systems/exporting-nfs-shares_managing-file-systems#introduction-to-nfs_exporting-nfs-shares
I'm using Ubuntu 20.04 rather than RHEL, but the nfs-server service
absolutely will not start if it can't launch rpcbind as a precursor:
-----------------------------
root@helios:~# systemctl stop rpcbind
Warning: Stopping rpcbind.service, but it can still be activated by:
rpcbind.socket
root@helios:~# systemctl mask rpcbind
Created symlink /etc/systemd/system/rpcbind.service → /dev/null.
root@helios:~# systemctl restart nfs-server
Job for nfs-server.service canceled.
root@helios:~# systemctl status nfs-server
● nfs-server.service - NFS server and services
Loaded: loaded (/lib/systemd/system/nfs-server.service;
enabled; vendor preset: enabled)
Drop-In: /run/systemd/generator/nfs-server.service.d
└─order-with-mounts.conf
Active: failed (Result: exit-code) since Fri 2020-09-25
14:21:46 UTC; 10s ago
Process: 3923 ExecStartPre=/usr/sbin/exportfs -r (code=exited,
status=0/SUCCESS)
Process: 3925 ExecStart=/usr/sbin/rpc.nfsd $RPCNFSDARGS
(code=exited, status=1/FAILURE)
Process: 3931 ExecStopPost=/usr/sbin/exportfs -au (code=exited,
status=0/SUCCESS)
Process: 3932 ExecStopPost=/usr/sbin/exportfs -f (code=exited,
status=0/SUCCESS)
Main PID: 3925 (code=exited, status=1/FAILURE)
Sep 25 14:21:46 helios systemd[1]: Starting NFS server and services...
Sep 25 14:21:46 helios rpc.nfsd[3925]: rpc.nfsd: writing fd to
kernel failed: errno 111 (Connection refused)
Sep 25 14:21:46 helios rpc.nfsd[3925]: rpc.nfsd: unable to set any
sockets for nfsd
Sep 25 14:21:46 helios systemd[1]: nfs-server.service: Main process
exited, code=exited, status=1/FAILURE
Sep 25 14:21:46 helios systemd[1]: nfs-server.service: Failed with
result 'exit-code'.
Sep 25 14:21:46 helios systemd[1]: Stopped NFS server and services.
-----------------------------
So, now I'm confused. Does NFSv4 need rpcbind to be running, does
it just need it when it launches, or something else? I made a local
copy of the systemd service file and edited out the rpcbind
dependency, so it's not that.
Do you have v2 and v3 turned off in /etc/nfs.conf?
It's an Ubuntu system, hence doesn't use /etc/nfs.conf; however I do
have these variables set in /etc/default/nfs-kernel-server :
MOUNTD_NFS_V2="no"
MOUNTD_NFS_V3="no"
RPCMOUNTDOPTS="--manage-gids -N 2 -N 3"
maybe this isn't the correct way to disable NFSv2/3, but it's all I
could find documented.
That should do it, but if you want to verify that it worked, you can
read /proc/fs/nfsd/versions.
That's it. The syntax above is *not* disabling NFSv3:
root@helios:~# cat /proc/fs/nfsd/versions
-2 +3 +4 +4.1 +4.2
Looking more closely.... Does nfs-kernel-server have an RPCNFSDOPTS
variable or something? rpc.nfsd needs to be run with -N 2 -N 3 as well.
--b.
Hmmm, not exactly, but here are the relevant details from the
/usr/lib/systemd/system/nfs-server.service file:
-----------------------------
Wants=nfs-config.service
After=nfs-config.service
[Service]
EnvironmentFile=-/run/sysconfig/nfs-utils
Type=oneshot
RemainAfterExit=yes
ExecStartPre=/usr/sbin/exportfs -r
ExecStart=/usr/sbin/rpc.nfsd $RPCNFSDARGS
-----------------------------
which I think explains why this isn't working properly, based on your
comment. The /run/sysconfig/nfs-utils file is assembled by
nfs-config.service from the /etc/default/nfs-kernel-server file:
root@helios:/run/sysconfig# cat nfs-utils
PIPEFS_MOUNTPOINT=/run/rpc_pipefs
RPCNFSDARGS=" 16"
RPCMOUNTDARGS="--manage-gids -N 2 -N 3"
STATDARGS=""
RPCSVCGSSDARGS=""
SVCGSSDARGS=""
So rpc.nfsd is only being started with $RPCNFSDARGS and not $RPCMOUNTDARGS
I think what you're saying is that I need to add $RPCMOUNTDARGS to the
service file command line for rpc.nfsd?
-
Ugh, lengthy aside: I'm finding so many bugs in Debian/Ubuntu packaging
based on packagers minimal understanding of how NFS/autofs work. We do
computation biology, and for almost a year were plagued by a performance
slow down which boiled down to these 2 lines in /etc/passwd:
syslog:x:102:106::/home/syslog:/usr/sbin/nologin
cups-pk-helper:x:124:118:user for cups-pk-helper
service,,,:/home/cups-pk-helper:/usr/sbin/nologin
Notice the invocation of non-existent home directories. On Arch Linux
systems these are set to /:
cups:x:209:209:cups helper user:/:/sbin/nologin
On non-network filesystem workstations this is harmless, but we use
autofs for home directory mounts, and the biologists run their software
from anaconda environments. A rather poor design decision, but when
launched, mini/anaconda scans through /etc/password looking for places
environments might be hidden away. autofs was hanging every time there
was an attempted access of a non-existent home directory. As experienced
by the researchers, they would try and run a program and it would just
hang 5-10 minutes loading some python module.
This is why I was complaining about documentation. There's now a whole
generation of IT professionals for whom NFS is entirely opaque due to a
lack of up to date documentation.
The linux kernel version is 5.4.0, and the nfs-kernel-server package
version is 1:1.3.4-2.5ubuntu3.3 (so upstream 1.3.4), but I'm not
sure this is relevant.
I can't reproduce the problem on my 5.9-ish server, but I also can't
recall any relevant changes here.
Looking back through the history.... Kinglong Mee fixed the server to
ignore rpbind failures in the v4-only case about 7 years ago, back in
4.13.
--b.