Issues with GlusterFS 3.2.x: NFS and Transport endpoint is not connected

jkahn at idea11.com.au (James Kahn) · Mon, 9 Jul 2012 01:56:57 +0000

Hi there,

We're using GlusterFS 3.2.7 currently and experiencing issue with NFS and
GlusterFS mounts. We were on 3.2.6 and had similar issues. Our install is
reasonably small. I'm tentative about moving to 3.3.x as we're looking for
a stable platform, not the latest features.

Our setup
---------
* 1 x GlusterFS server running CentOS 6.2 and GlusterFS 3.2.7
* Using CTDB as per
http://download.gluster.com/pub/gluster/systems-engineering/Gluster_CTDB_se
tup.v1.pdf
* Two volumes: ctdb-lock (1 brick) and storage1 (2 bricks). ctdb-lock
brick is very small and on ext4, storage1 bricks are 2TB XFS volumes
* CTDB is configured as we want to load balance NFS in the future

Essentially, there are two issues.

First issue
-----------
GlusterFS NFS works when first initiated but performance rapidly drops and
memory usage increases. After around 12 hours of use (and writing upwards
of 200GB) memory usage for the GlusterFS NFS process is around 8GB. This
is on a machine with 4GB RAM. This looks like a pretty significant memory
leak. Restarting the GlusterFS service restores memory and performance. It
is not resolved by itself.

This is causing issues as clients writing the data slow down and the jobs
writing them eventually fail. The initial write set runs at around 850Mbps
(over gigabit) and slows down to around 30Mbps over time.

Second issue
------------
In an attempt to work around the first issue we disabled GlusterFS NFS on
the volumes (nfs.register-with-portmap: off and nfs.disable: on), mounted
the volumes locally on the GlusterFS server and re-exported with native
NFS on the CentOS server.

After a certain amount of time or data is written the NFS writes start to
fail. On investigating, it appears that this is due to the GlusterFS mount
on the server failing - after remounting GlusterFS it continues to work.
However, this GlusterFS volume is mounted twice, on /storage1 and
/gluster/storage1, and only one of the mount points failed.

/etc/fstab snippet:
localhost:/ctdb-lock	/gluster/ctdb-lock	glusterfs	rw,_netdev	0 0
localhost:/storage1	/gluster/storage1	glusterfs	rw,_netdev	0 0
localhost:/storage1	/storage1		glusterfs	rw,_netdev	0 0

/etc/exports:
/storage1	*(rw,fsid=1,sync,no_root_squash,mp=/storage1)

Error:
[root at HOST /]# cd storage1
-bash: cd: storage1: Transport endpoint is not connected

What am I doing wrong, or what can I do to work around this issue? We need
reliable NFS and Gluster isn't giving it to us right now.

Thanks,
JK.