On Fri, Nov 07, 2014 at 07:51:47PM -0500, Jason Russler wrote: > I've run into this as well. After installing hosted-engine for ovirt > on a gluster volume. The only way to get things working again for me > was to manually de-register (rpcinfo -d ...) nlockmgr from the > portmapper and then restart glusterd. Then gluster's NFS successfully > registers. I don't really get what's going on though. Is this on RHEL/CentOS 7? A couple of days back someone on IRC had an issue with this as well. We found out that "rpcbind.service" uses the "-w" option by default (for warm-restart). Registered services are written to a cache file, and upon reboot those services get re-registered automatically, even when not running. The solution was something like this: # cp /usr/lib/systemd/system/rpcbind.service /etc/systemd/system/ * edit /etc/systemd/system/rpcbind.service and remove the "-w" option # systemctl daemon-reload # systemctl restart rpcbind.service # systemctl restart glusterd.service I am not sure why "-w" was added by default, but it doen not seem to play nice with Gluster/NFS. Gluster/NFS does not want to break other registered services, so it bails out when something is registered already. HTH, Niels > > ----- Original Message ----- > From: "Sven Achtelik" <Sven.Achtelik@xxxxxxxxxxx> > To: gluster-users@xxxxxxxxxxx > Sent: Friday, November 7, 2014 5:28:32 PM > Subject: Re: NFS not start on localhost > > > > Hi everyone, > > > > I’m facing the exact same issue on my installation. Nfs.log entries indicate that something is blocking the gluster nfs from registering with rpcbind. > > > > [root@ovirt-one ~]# rpcinfo -p > > program vers proto port service > > 100000 4 tcp 111 portmapper > > 100000 3 tcp 111 portmapper > > 100000 2 tcp 111 portmapper > > 100000 4 udp 111 portmapper > > 100000 3 udp 111 portmapper > > 100000 2 udp 111 portmapper > > 100005 3 tcp 38465 mountd > > 100005 1 tcp 38466 mountd > > 100003 3 tcp 2049 nfs > > 100227 3 tcp 2049 nfs_acl > > 100021 3 udp 34343 nlockmgr > > 100021 4 udp 34343 nlockmgr > > 100021 3 tcp 54017 nlockmgr > > 100021 4 tcp 54017 nlockmgr > > 100024 1 udp 39097 status > > 100024 1 tcp 53471 status > > 100021 1 udp 715 nlockmgr > > > > I’m sure that I’m not using the system NFS Server and I didn’t mount any nfs share. > > > > @Tibor: Did you solve that issue somehow ? > > > > Best, > > > > Sven > > > > > > > Hi, > Thank you for you reply. > I did your recommendations, but there are no changes. > In the nfs.log there are no new things. > [ root at node0 glusterfs]# reboot > Connection to 172.16.0.10 closed by remote host. > Connection to 172.16.0.10 closed. > [ tdemeter at sirius-31 ~]$ ssh root at 172.16.0.10 > root at 172.16.0.10 's password: > Last login: Mon Oct 20 11:02:13 2014 from 192.168.133.106 > [ root at node0 ~]# systemctl status nfs.target > nfs.target - Network File System Server > Loaded: loaded (/usr/lib/systemd/system/nfs.target; disabled) > Active: inactive (dead) > [ root at node0 ~]# gluster volume status engine > Status of volume: engine > Gluster process Port Online Pid > ------------------------------------------------------------------------------ > Brick gs00.itsmart.cloud:/gluster/engine0 50160 Y 3271 > Brick gs01.itsmart.cloud:/gluster/engine1 50160 Y 595 > NFS Server on localhost N/A N N/A > Self-heal Daemon on localhost N/A Y 3286 > NFS Server on gs01.itsmart.cloud 2049 Y 6951 > Self-heal Daemon on gs01.itsmart.cloud N/A Y 6958 > Task Status of Volume engine > ------------------------------------------------------------------------------ > There are no active volume tasks > [ root at node0 ~]# systemctl status > Display all 262 possibilities? (y or n) > [ root at node0 ~]# systemctl status nfs-lock > nfs-lock.service - NFS file locking service. > Loaded: loaded (/usr/lib/systemd/system/nfs-lock.service; enabled) > Active: inactive (dead) > [ root at node0 ~]# systemctl stop nfs-lock > [ root at node0 ~]# systemctl restart gluster > glusterd.service glusterfsd.service gluster.mount > [ root at node0 ~]# systemctl restart gluster > glusterd.service glusterfsd.service gluster.mount > [ root at node0 ~]# systemctl restart glusterfsd.service > [ root at node0 ~]# systemctl restart glusterd.service > [ root at node0 ~]# gluster volume status engine > Status of volume: engine > Gluster process Port Online Pid > ------------------------------------------------------------------------------ > Brick gs00.itsmart.cloud:/gluster/engine0 50160 Y 5140 > Brick gs01.itsmart.cloud:/gluster/engine1 50160 Y 2037 > NFS Server on localhost N/A N N/A > Self-heal Daemon on localhost N/A N N/A > NFS Server on gs01.itsmart.cloud 2049 Y 6951 > Self-heal Daemon on gs01.itsmart.cloud N/A Y 6958 > Any other idea? > Tibor > ----- Eredeti üzenet ----- > > On Mon, Oct 20, 2014 at 09:04:2.8AM +0200, Demeter Tibor wrote: > > > Hi, > > > > > > This is the full nfs.log after delete & reboot. > > > It is refers to portmap registering problem. > > > > > > [ root at node0 glusterfs]# cat nfs.log > > > [2014-10-20 06:48:43.221136] I [glusterfsd.c:1959:main] > > > 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.5.2 > > > (/usr/sbin/glusterfs -s localhost --volfile-id gluster/nfs -p > > > /var/lib/glusterd/nfs/run/nfs.pid -l /var/log/glusterfs/nfs.log -S > > > /var/run/567e0bba7ad7102eae3049e2ad6c3ed7.socket) > > > [2014-10-20 06:48:43.224444] I [socket.c:3561:socket_init] > > > 0-socket.glusterfsd: SSL support is NOT enabled > > > [2014-10-20 06:48:43.224475] I [socket.c:3576:socket_init] > > > 0-socket.glusterfsd: using system polling thread > > > [2014-10-20 06:48:43.224654] I [socket.c:3561:socket_init] 0-glusterfs: SSL > > > support is NOT enabled > > > [2014-10-20 06:48:43.224667] I [socket.c:3576:socket_init] 0-glusterfs: > > > using system polling thread > > > [2014-10-20 06:48:43.235876] I > > > [rpcsvc.c:2127:rpcsvc_set_outstanding_rpc_limit] 0-rpc-service: Configured > > > rpc.outstanding-rpc-limit with value 16 > > > [2014-10-20 06:48:43.254087] I [socket.c:3561:socket_init] > > > 0-socket.nfs-server: SSL support is NOT enabled > > > [2014-10-20 06:48:43.254116] I [socket.c:3576:socket_init] > > > 0-socket.nfs-server: using system polling thread > > > [2014-10-20 06:48:43.255241] I [socket.c:3561:socket_init] > > > 0-socket.nfs-server: SSL support is NOT enabled > > > [2014-10-20 06:48:43.255264] I [socket.c:3576:socket_init] > > > 0-socket.nfs-server: using system polling thread > > > [2014-10-20 06:48:43.257279] I [socket.c:3561:socket_init] > > > 0-socket.nfs-server: SSL support is NOT enabled > > > [2014-10-20 06:48:43.257315] I [socket.c:3576:socket_init] > > > 0-socket.nfs-server: using system polling thread > > > [2014-10-20 06:48:43.258135] I [socket.c:3561:socket_init] 0-socket.NLM: > > > SSL support is NOT enabled > > > [2014-10-20 06:48:43.258157] I [socket.c:3576:socket_init] 0-socket.NLM: > > > using system polling thread > > > [2014-10-20 06:48:43.293724] E > > > [rpcsvc.c:1314:rpcsvc_program_register_portmap] 0-rpc-service: Could not > > > register with portmap > > > [2014-10-20 06:48:43.293760] E [nfs.c:332:nfs_init_versions] 0-nfs: Program > > > NLM4 registration failed > > > > The above line suggests that there already is a service registered at > > portmapper for the NLM4 program/service. This happens when the kernel > > module 'lockd' is loaded. The kernel NFS-client and NFS-server depend on > > this, but unfortunately it conflicts with the Gluster/nfs server. > > > > Could you verify that the module is loaded? > > - use 'lsmod | grep lockd' to check the modules > > - use 'rpcinfo | grep nlockmgr' to check the rpcbind registrations > > > > Make sure that you do not mount any NFS exports on the Gluster server. > > Unmount all NFS mounts. > > > > You mentioned you are running CentOS-7, which is systemd based. You > > should be able to stop any conflicting NFS services like this: > > > > # systemctl stop nfs-lock.service > > # systemctl stop nfs.target > > # systemctl disable nfs.target > > > > If all these services cleanup themselves, you should be able to start > > the Gluster/nfs service: > > > > # systemctl restart glusterd.service > > > > In case some bits are still lingering around, it might be easier to > > reboot after disabling the 'nfs.target'. > > > > > [2014-10-20 06:48:43.293771] E [nfs.c:1312:init] 0-nfs: Failed to > > > initialize protocols > > > [2014-10-20 06:48:43.293777] E [xlator.c:403:xlator_init] 0-nfs-server: > > > Initialization of volume 'nfs-server' failed, review your volfile again > > > [2014-10-20 06:48:43.293783] E [graph.c:307:glusterfs_graph_init] > > > 0-nfs-server: initializing translator failed > > > [2014-10-20 06:48:43.293789] E [graph.c:502:glusterfs_graph_activate] > > > 0-graph: init failed > > > pending frames: > > > frame : type(0) op(0) > > > > > > patchset: git://git.gluster.com/glusterfs.git > > > signal received: 11 > > > time of crash: 2014-10-20 06:48:43configuration details: > > > argp 1 > > > backtrace 1 > > > dlfcn 1 > > > fdatasync 1 > > > libpthread 1 > > > llistxattr 1 > > > setfsid 1 > > > spinlock 1 > > > epoll.h 1 > > > xattr.h 1 > > > st_atim.tv_nsec 1 > > > package-string: glusterfs 3.5.2 > > > [ root at node0 glusterfs]# systemctl status portma > > > portma.service > > > Loaded: not-found (Reason: No such file or directory) > > > Active: inactive (dead) > > > > > > > > > > > > Also I have checked the rpcbind service. > > > > > > [ root at node0 glusterfs]# systemctl status rpcbind.service > > > rpcbind.service - RPC bind service > > > Loaded: loaded (/usr/lib/systemd/system/rpcbind.service; enabled) > > > Active: active (running) since h 2014-10-20 08:48:39 CEST; 2min 52s ago > > > Process: 1940 ExecStart=/sbin/rpcbind -w ${RPCBIND_ARGS} (code=exited, > > > status=0/SUCCESS) > > > Main PID: 1946 (rpcbind) > > > CGroup: /system.slice/rpcbind.service > > > └─1946 /sbin/rpcbind -w > > > > > > okt 20 08:48:39 node0.itsmart.cloud systemd[1]: Starting RPC bind > > > service... > > > okt 20 08:48:39 node0.itsmart.cloud systemd[1]: Started RPC bind service. > > > > > > The restart does not solve this problem. > > > > > > > > > I think this is the problem. Why are "exited" the portmap status? > > > > The 'portmap' service has been replaced with 'rpcbind' since RHEL-6. > > They have the same functionality, 'rpcbind' just happens to be the newer > > version. > > > > Did you file a bug for this already? As Vijay mentions, this crash seems > > to happen because the Gluster/nfs service fails to initialize correctly > > and then fails to cleanup correctly. The cleanup should get fixed, and > > we should also give an easier to understand error message. > > > > Thanks, > > Niels > > > > > > > > > > > On node1 is ok: > > > > > > [ root at node1 ~]# systemctl status rpcbind.service > > > rpcbind.service - RPC bind service > > > Loaded: loaded (/usr/lib/systemd/system/rpcbind.service; enabled) > > > Active: active (running) since p 2014-10-17 19:15:21 CEST; 2 days ago > > > Main PID: 1963 (rpcbind) > > > CGroup: /system.slice/rpcbind.service > > > └─1963 /sbin/rpcbind -w > > > > > > okt 17 19:15:21 node1.itsmart.cloud systemd[1]: Starting RPC bind > > > service... > > > okt 17 19:15:21 node1.itsmart.cloud systemd[1]: Started RPC bind service. > > > > > > > > > > > > Thanks in advance > > > > > > Tibor > > > > > > > > > > > > ----- Eredeti üzenet ----- > > > > On 10/19/2014 06:56 PM, Niels de Vos wrote: > > > > > On Sat, Oct 18, 2014 at 01:24:12PM +0200, Demeter Tibor wrote: > > > > >> Hi, > > > > >> > > > > >> [ root at node0 ~]# tail -n 20 /var/log/glusterfs/nfs.log > > > > >> [2014-10-18 07:41:06.136035] E [graph.c:307:glusterfs_graph_init] > > > > >> 0-nfs-server: initializing translator failed > > > > >> [2014-10-18 07:41:06.136040] E [graph.c:502:glusterfs_graph_activate] > > > > >> 0-graph: init failed > > > > >> pending frames: > > > > >> frame : type(0) op(0) > > > > >> > > > > >> patchset: git://git.gluster.com/glusterfs.git > > > > >> signal received: 11 > > > > >> time of crash: 2014-10-18 07:41:06configuration details: > > > > >> argp 1 > > > > >> backtrace 1 > > > > >> dlfcn 1 > > > > >> fdatasync 1 > > > > >> libpthread 1 > > > > >> llistxattr 1 > > > > >> setfsid 1 > > > > >> spinlock 1 > > > > >> epoll.h 1 > > > > >> xattr.h 1 > > > > >> st_atim.tv_nsec 1 > > > > >> package-string: glusterfs 3.5.2 > > > > > > > > > > This definitely is a gluster/nfs issue. For whatever reasone, the > > > > > gluster/nfs server crashes :-/ The log does not show enough details, > > > > > some more lines before this are needed. > > > > > > > > > > > > > I wonder if the crash is due to a cleanup after the translator > > > > initialization failure. The complete logs might help in understanding > > > > why the initialization failed. > > > > > > > > -Vijay > > > > > > > > > > > > > > > _______________________________________________ > Gluster-users mailing list > Gluster-users@xxxxxxxxxxx > http://supercolony.gluster.org/mailman/listinfo/gluster-users > _______________________________________________ > Gluster-users mailing list > Gluster-users@xxxxxxxxxxx > http://supercolony.gluster.org/mailman/listinfo/gluster-users
Attachment:
pgpqfDFwOhltX.pgp
Description: PGP signature
_______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://supercolony.gluster.org/mailman/listinfo/gluster-users