Re: rpcbind redux

Patrick Goetz <pgoetz@xxxxxxxxxxxxxxx> · Fri, 2 Oct 2020 10:12:24 -0500

On 10/1/20 4:43 PM, J. Bruce Fields wrote:
On Thu, Oct 01, 2020 at 04:05:13PM -0500, Patrick Goetz wrote:

On 10/1/20 3:06 PM, J. Bruce Fields wrote:
On Thu, Oct 01, 2020 at 01:41:39PM -0500, Patrick Goetz wrote:
Hi Bruce,

Thanks for the reply. See below.

On 10/1/20 1:30 PM, J. Bruce Fields wrote:
On Fri, Sep 25, 2020 at 09:40:16AM -0500, Patrick Goetz wrote:
My University information security office does not like rpcbind and
will automatically quarantine any system for which they detect a
portmapper running on an exposed port.

Since I exclusively use NFSv4 I was happy to "learn" that NFSv4
doesn't require rpcbind any more.  For example, here's what it says
in the current RHEL documentation:

"NFS version 4 (NFSv4) works through firewalls and on the Internet,
no longer requires an rpcbind service, supports Access Control Lists
(ACLs), and utilizes stateful operations."

https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/managing_file_systems/exporting-nfs-shares_managing-file-systems#introduction-to-nfs_exporting-nfs-shares

I'm using Ubuntu 20.04 rather than RHEL, but the nfs-server service
absolutely will not start if it can't launch rpcbind as a precursor:

-----------------------------
root@helios:~# systemctl stop rpcbind
Warning: Stopping rpcbind.service, but it can still be activated by:
   rpcbind.socket
root@helios:~# systemctl mask rpcbind
Created symlink /etc/systemd/system/rpcbind.service → /dev/null.

root@helios:~# systemctl restart nfs-server
Job for nfs-server.service canceled.
root@helios:~# systemctl status nfs-server
● nfs-server.service - NFS server and services
      Loaded: loaded (/lib/systemd/system/nfs-server.service;
enabled; vendor preset: enabled)
     Drop-In: /run/systemd/generator/nfs-server.service.d
              └─order-with-mounts.conf
      Active: failed (Result: exit-code) since Fri 2020-09-25
14:21:46 UTC; 10s ago
     Process: 3923 ExecStartPre=/usr/sbin/exportfs -r (code=exited,
status=0/SUCCESS)
     Process: 3925 ExecStart=/usr/sbin/rpc.nfsd $RPCNFSDARGS
(code=exited, status=1/FAILURE)
     Process: 3931 ExecStopPost=/usr/sbin/exportfs -au (code=exited,
status=0/SUCCESS)
     Process: 3932 ExecStopPost=/usr/sbin/exportfs -f (code=exited,
status=0/SUCCESS)
    Main PID: 3925 (code=exited, status=1/FAILURE)

Sep 25 14:21:46 helios systemd[1]: Starting NFS server and services...
Sep 25 14:21:46 helios rpc.nfsd[3925]: rpc.nfsd: writing fd to
kernel failed: errno 111 (Connection refused)
Sep 25 14:21:46 helios rpc.nfsd[3925]: rpc.nfsd: unable to set any
sockets for nfsd
Sep 25 14:21:46 helios systemd[1]: nfs-server.service: Main process
exited, code=exited, status=1/FAILURE
Sep 25 14:21:46 helios systemd[1]: nfs-server.service: Failed with
result 'exit-code'.
Sep 25 14:21:46 helios systemd[1]: Stopped NFS server and services.
-----------------------------

So, now I'm confused.  Does NFSv4 need rpcbind to be running, does
it just need it when it launches, or something else?  I made a local
copy of the systemd service file and edited out the rpcbind
dependency, so it's not that.

Do you have v2 and v3 turned off in /etc/nfs.conf?

It's an Ubuntu system, hence doesn't use /etc/nfs.conf; however I do
have these variables set in /etc/default/nfs-kernel-server :

   MOUNTD_NFS_V2="no"
   MOUNTD_NFS_V3="no"
   RPCMOUNTDOPTS="--manage-gids -N 2 -N 3"

maybe this isn't the correct way to disable NFSv2/3, but it's all I
could find documented.

That should do it, but if you want to verify that it worked, you can
read /proc/fs/nfsd/versions.

That's it.  The syntax above is *not* disabling NFSv3:

root@helios:~# cat /proc/fs/nfsd/versions
-2 +3 +4 +4.1 +4.2

Looking more closely....  Does nfs-kernel-server have an RPCNFSDOPTS
variable or something?  rpc.nfsd needs to be run with -N 2 -N 3 as well.

--b.

Hmmm, not exactly, but here are the relevant details from the 
/usr/lib/systemd/system/nfs-server.service file:

-----------------------------
Wants=nfs-config.service
After=nfs-config.service

[Service]
EnvironmentFile=-/run/sysconfig/nfs-utils

Type=oneshot
RemainAfterExit=yes
ExecStartPre=/usr/sbin/exportfs -r
ExecStart=/usr/sbin/rpc.nfsd $RPCNFSDARGS
-----------------------------

which I think explains why this isn't working properly, based on your 
comment. The /run/sysconfig/nfs-utils file is assembled by 
nfs-config.service from the /etc/default/nfs-kernel-server file:

root@helios:/run/sysconfig# cat nfs-utils
PIPEFS_MOUNTPOINT=/run/rpc_pipefs
RPCNFSDARGS=" 16"
RPCMOUNTDARGS="--manage-gids -N 2 -N 3"
STATDARGS=""
RPCSVCGSSDARGS=""
SVCGSSDARGS=""

So rpc.nfsd is only being started with $RPCNFSDARGS and not $RPCMOUNTDARGS

I think what you're saying is that I need to add $RPCMOUNTDARGS to the 
service file command line for rpc.nfsd?

 -

Ugh, lengthy aside: I'm finding so many bugs in Debian/Ubuntu packaging 
based on packagers minimal understanding of how NFS/autofs work. We do 
computation biology, and for almost a year were plagued by a performance 
slow down which boiled down to these 2 lines in /etc/passwd:

syslog:x:102:106::/home/syslog:/usr/sbin/nologin
cups-pk-helper:x:124:118:user for cups-pk-helper 
service,,,:/home/cups-pk-helper:/usr/sbin/nologin

Notice the invocation of non-existent home directories. On Arch Linux 
systems these are set to /:

    cups:x:209:209:cups helper user:/:/sbin/nologin

On non-network filesystem workstations this is harmless, but we use 
autofs for home directory mounts, and the biologists run their software 
from anaconda environments. A rather poor design decision, but when 
launched, mini/anaconda scans through /etc/password looking for places 
environments might be hidden away. autofs was hanging every time there 
was an attempted access of a non-existent home directory. As experienced 
by the researchers, they would try and run a program and it would just 
hang 5-10 minutes loading some python module.

This is why I was complaining about documentation.  There's now a whole 
generation of IT professionals for whom NFS is entirely opaque due to a 
lack of up to date documentation.

The linux kernel version is 5.4.0, and the nfs-kernel-server package
version is 1:1.3.4-2.5ubuntu3.3 (so upstream 1.3.4), but I'm not
sure this is relevant.

I can't reproduce the problem on my 5.9-ish server, but I also can't
recall any relevant changes here.

Looking back through the history....  Kinglong Mee fixed the server to
ignore rpbind failures in the v4-only case about 7 years ago, back in
4.13.

--b.