Re: [Gluster-devel] lockd: server not responding, timed out

Niels de Vos <ndevos@xxxxxxxxxx> · Mon, 26 Jan 2015 13:37:08 +0100

On Mon, Jan 26, 2015 at 12:26:53AM +0000, Peter Auyeung wrote:
> Hi Niels,
> 
> The question if we keep getting the lockd error even after restart and
> rebooted the NFS client..

This particular error would only occur when the NFS-server could not
register the nlockmgr RPC-program to rpcbind/portmapper. The most likely
scenario where this fails, is where there is an NFS-client (or service)
on the storage server that conflicts with the Gluster/NFS service.

If there are conflicting RPC services in rpcbind/portmapper, you may be
able check and remove those with the 'rpcinfo' command. Ports that are
listed in te output, but are not listed in netstat/ss are in used by
kernel services (like the lockd kernel module).

In order to restore the NLM function of Gluster/NFS, you can take these
steps:

1. ensure that there are no other NFS-services (server or client)
   running on the Gluster storage server. Gluster/NFS should be the only
   service which does some NFS on the server.
2. stop the rpcbind service
3. clear the rpcbind-cache (rm /var/lib/rpcbind/portmap.xdr)
4. start the rpcbind service
5. restart the Gluster/NFS service

In case your NFS-client got connected to the incorrect NLM service on
your storage server, you would need to unmount and mount the export
again.

Niels

> 
> Peter
> ________________________________________
> From: Niels de Vos [ndevos@xxxxxxxxxx]
> Sent: Saturday, January 24, 2015 3:26 AM
> To: Peter Auyeung
> Cc: gluster-users@xxxxxxxxxxx; gluster-devel@xxxxxxxxxxx
> Subject: Re: [Gluster-devel]  lockd: server  not responding, timed out
> 
> On Fri, Jan 23, 2015 at 11:50:26PM +0000, Peter Auyeung wrote:
> > We have a 6 nodes gluster running ubuntu on xfs sharing gluster
> > volumes over NFS been running fine for 3 months.
> > We restarted glusterfs-server on one of the node and all NFS clients
> > start getting the " lockd: server  not responding, timed out" on
> > /var/log/messages
> >
> > We are still able to read write but seems like process that require a
> > persistent file lock failed like database exports.
> >
> > We have an interim fix to remount the NFS with nolock option but need
> > to know why that is necessary all in a sudden after a service
> > glusterfs-server restart on one of the gluster node
> 
> The cause that you need to mount wiht 'nolock' is that one server can
> only have one NLM-service active. The Linux NFS-client uses the 'lockd'
> kernel module, and the Gluster/NFS server provides its own lock manager.
> To be able to use a lock manager, it needs to be registered at
> rpcbind/portmapper. Only one lock manager can be registered at a time,
> the 2nd one that tries to register will fail. In case the NFS-client has
> registered the lockd kernel module as lock manager, any locking requests
> to the Gluster/NFS service will fail and you will see those messages in
> /var/log/messages.
> 
> This is one of the main reasons why it is not advised to access volumes
> over NFS on a Gluster storage server. You should rather use the
> GlusterFS protocol for mounting volumes locally. (Or even better,
> seperate your storage servers from the application servers.)
> 
> HTH,
> Niels
Attachment:
pgpyCjfqla1Pd.pgp

Description: PGP signature
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users