Problems with Gluster NFS export (unable to stat files)

fredrik at realisestudio.com (fredrik ronnvall) · Tue, 29 May 2012 17:58:21 +0100

Hi,

We're seeing some random errors quite frequently mounting one of our
volumes via NFS. At random a client will fail to access certain
files/directories, they show up like this:

$ ls -l
ls: cannot access xxx: No such file or directory
ls: cannot access yyy: No such file or directory
l????????? ? ?       ?         ?                ? xxx
l????????? ? ?       ?         ?                ? yyy
drwxrwxrwx 2 user group  95 2012-05-08 18:11 zzz

Tracing back the NFS mount to one of the gluster servers, this shows
up in nfs.log:

[2012-05-09 14:47:32.807853] E
[client3_1-fops.c:411:client3_1_stat_cbk] 0-glustervol1-client-2:
remote operation failed: No such file or directory
[2012-05-09 14:47:32.808430] E
[client3_1-fops.c:411:client3_1_stat_cbk] 0-glustervol1-client-3:
remote operation failed: No such file or directory
[2012-05-09 14:47:32.841125] E
[client3_1-fops.c:411:client3_1_stat_cbk] 0-glustervol1-client-3:
remote operation failed: No such file or directory
[2012-05-09 14:47:32.841762] E
[client3_1-fops.c:411:client3_1_stat_cbk] 0-glustervol1-client-2:
remote operation failed: No such file or directory

Restarting the gluster server seems to fix the issue, though I am
unhappy with this solution.

Today this showed up in the logs following the same symptoms:
[2012-05-29 10:19:04.332031] E
[afr-self-heal-metadata.c:561:afr_sh_metadata_post_nonblocking_inodelk_cbk]
0-glustervol1-replicate-3: Non Blocking metadata inodelks failed for
<path>.
[2012-05-29 10:19:04.332059] E
[afr-self-heal-metadata.c:563:afr_sh_metadata_post_nonblocking_inodelk_cbk]
0-glustervol1-replicate-3: Metadata self-heal failed for <path>.
[2012-05-29 10:19:04.332503] E
[afr-self-heal-metadata.c:561:afr_sh_metadata_post_nonblocking_inodelk_cbk]
0-glustervol1-replicate-2: Non Blocking metadata inodelks failed for
<path>.
[2012-05-29 10:19:04.332534] E
[afr-self-heal-metadata.c:563:afr_sh_metadata_post_nonblocking_inodelk_cbk]
0-glustervol1-replicate-2: Metadata self-heal failed for <path>.

A restart of gluster on the server the client was connected to from
solved the issue.

This seems to happen several times a day and is becoming a serious
issue. The problem frequently happens to symlinks, however regular
files are also affected.

The volume in question is configured across 4 servers (OpenSUSE 11.3)
with 2 bricks per server as distributed-replicate. Gluster version is
3.2.5.

Has anyone experienced similar issues? Is there a sanity check of
sorts that I could carry out?

Fredrik