nfs problems

d.a.bretherton at reading.ac.uk (Dan Bretherton) · Fri, 25 Feb 2011 17:55:25 +0000

*
> *David Lloyd* david.lloyd at v-consultants.co.uk 
> <mailto:gluster-users%40gluster.org?Subject=Re%3A%20%5BGluster-users%5D%20nfs%20problems&In-Reply-To=%3CAANLkTikFYRjGSP66ptO01-Ms49iaG%3DSrfSbUm4pH9FAK%40mail.gmail.com%3E>
> /Wed Feb 16 08:09:36 PST 2011/
>
>     * Previous message: can't start/stop volume
>       <http://gluster.org/pipermail/gluster-users/2011-February/006653.html>
>     * Next message: GlusterFS 3.1.2 crash with NFS
>       <http://gluster.org/pipermail/gluster-users/2011-February/006660.html>
>     * *Messages sorted by:* [ date ]
>       <http://gluster.org/pipermail/gluster-users/2011-February/date.html#6656>
>       [ thread ]
>       <http://gluster.org/pipermail/gluster-users/2011-February/thread.html#6656>
>       [ subject ]
>       <http://gluster.org/pipermail/gluster-users/2011-February/subject.html#6656>
>       [ author ]
>       <http://gluster.org/pipermail/gluster-users/2011-February/author.html#6656>
>
>
> ------------------------------------------------------------------------
> getting lots of stale nfs filehandle errors
>
> we have 4 nodes in our cluster, clients nfs mount the volume from any node
> in a round-robin
>
> it appears that one node has gone bad. the clients mounting that node can't
> see the files that the others can see. ls -l gives rubbish for the metadata,
> and get lots of these lines in the nfs.log:
>
> [2011-02-16 15:33:32.538756] I [dht-layout.c:588:dht_layout_normalize]
> glustervol1-dht: found anomalies in /production/people.1. holes=2 overlaps=0
> [2011-02-16 15:33:32.540759] I [dht-layout.c:588:dht_layout_normalize]
> glustervol1-dht: found anomalies in /production/people.nano. holes=2
> overlaps=0
> [2011-02-16 15:33:32.543682] I [dht-layout.c:588:dht_layout_normalize]
> glustervol1-dht: found anomalies in /production/people.2. holes=2 overlaps=0
> [2011-02-16 15:33:32.507428] I [dht-layout.c:588:dht_layout_normalize]
> glustervol1-dht: found anomalies in /production/skeleton. holes=2 overlaps=0
> [2011-02-16 15:33:32.509440] I [dht-layout.c:588:dht_layout_normalize]
> glustervol1-dht: found anomalies in /production/svn. holes=2 overlaps=0
> [2011-02-16 15:33:32.511275] I [dht-layout.c:588:dht_layout_normalize]
> glustervol1-dht: found anomalies in /production/tempo. holes=2 overlaps=0
>
> Any ideas?
>
> Thanks
> David
*David,
I recently had a problem that resulted in error messages referring to 
"anomalies", "holes" and "overlaps" as above, and also quite frequently 
"mismatching layouts" as well.   In my case I had caused the problem to 
develop myself, by not mounting some of the backend filesystems properly 
(without extended attribute support explicitly enabled).  That's 
obviously not the case here, but as nobody else has replied to this 
posting I thought I would briefly explain how I got rid of the errors in 
my volume.  Once the backend filesystems were mounted correctly (with 
the "user_xattr" mount option) I performed the following procedure to 
"sanitize" them, following advice from Gluster.  Note that I was advised 
to take a full backup before attempting this procedure.

1) Stop the volume with "gluster volume stop ..."
2) Run a script called "backend-xattr-sanitize.sh" on each of the 
backend filesystems.  The script can be downloaded from here: 
https://github.com/gluster/glusterfs/blob/master/extras/backend-xattr-sanitize.sh
3) Start the volume and mount it
4) Run "find .| xargs stat >>/dev/null 2>&1" in the mount point

It would probably be wise for you to take independent advice before 
using the "sanitize" script yourself, but it certainly worked for me.

-Dan.