gnfs socket errors during client mount, unmount and patch to illustrate

Erik Jacobson <erik.jacobson@xxxxxxx> · Tue, 22 Feb 2022 12:34:52 -0600

We have hacked around these errors, produced in glusterfs79 and
glusterfs93 when an NFS client mounts or unmounts. On one of the
installed superocmputers, one of gluster server nfs.log files as over 1GB.

    [2022-02-21 22:39:32.803070 +0000] W [socket.c:767:__socket_rwv] 0-socket.nfs-server: readv on 172.23.0.5:60126 failed (No data available)
    [2022-02-21 22:39:32.806102 +0000] W [socket.c:767:__socket_rwv] 0-socket.nfs-server: readv on 172.23.0.5:919 failed (No data available)
    [2022-02-21 22:39:32.863435 +0000] W [socket.c:767:__socket_rwv] 0-socket.nfs-server: readv on 172.23.0.5:60132 failed (No data available)
    [2022-02-21 22:39:32.864202 +0000] W [socket.c:767:__socket_rwv] 0-socket.nfs-server: readv on 172.23.0.5:673 failed (No data available)
    [2022-02-21 22:39:32.934893 +0000] W [socket.c:767:__socket_rwv] 0-socket.nfs-server: readv on 172.23.0.5:857 failed (No data available)
    [2022-02-21 22:39:48.744882 +0000] W [socket.c:767:__socket_rwv] 0-socket.nfs-server: readv on 127.0.0.1:949 failed (No data available)

We hacked around this with the following patch, which is not a patch for
inclusion but illustrates the issue. Since we are not experts in gluster
code, we isolated it to the nfs-server use of socket.c. We understand
that is likely not appropriate convention for a released patch.


diff -Narup glusterfs-9.3-orig/rpc/rpc-transport/socket/src/socket.c glusterfs-9.3/rpc/rpc-transport/socket/src/socket.c

--- glusterfs-9.3-orig/rpc/rpc-transport/socket/src/socket.c	2021-06-29 00:27:44.382408295 -0500
+++ glusterfs-9.3/rpc/rpc-transport/socket/src/socket.c	2022-02-21 20:23:41.101667807 -0600
@@ -733,6 +733,15 @@ __socket_rwv(rpc_transport_t *this, stru
         } else {
             ret = __socket_cached_read(this, opvector, opcount);
             if (ret == 0) {
+                if(strcmp(this->name,"nfs-server")) {
+                   /* nfs mount, unmount can produce ENODATA */
+                   gf_log(this->name, GF_LOG_DEBUG,
+                         "HPE - EOF from peer %s, since NFS, return ENOTCONN",
+                         this->peerinfo.identifier);
+                   opcount = -1;
+                   errno = ENOTCONN;
+                   break;
+                }
                 gf_log(this->name, GF_LOG_DEBUG,
                        "EOF on socket %d (errno:%d:%s); returning ENODATA",
                        sock, errno, strerror(errno));




* We understand you want us to move to Ganesha NFS. I mentioned in my
other notes that we are unable to move due to problems with Ganesha when
serving the NFS root in sles15sp3 (it gets stuck with nscd when nscd
is opening passwd, group files). While sles15sp4 fixes that, Ganesha
seems to be 25-35% slower than Gluster NFS and that would possibly
require us to increase already installed systems to have more
gluster/ganesha servers just to service the software update. We hope we can
provide test cases to Ganesha community and see if we can help speed it up
for our workloads. In our next release, we have tooling in place to
support Ganesha as a tech preview, off by default. So it will be there
to experiment and compare the two in our installations.
-------

Community Meeting Calendar:
Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk

Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-devel