Self Heal fails...

robert at bitcaster.de (Robert Krig) · Fri, 16 Sep 2011 18:36:13 +0200

Hi there. I'm new to GlusterFS. I'm currently evaluating it for
production usage.

I have two Storage Servers which use JFS as a filesystem for the
underlying export.

The setup is supposed to be replicated.

I've been experimenting with various settings for benchmarking and such,
as well as trying out different failure scenarios.

Anyways, the export directory on node 1 is out of sync with node 2.
So I mounted the storage volume via glusterfs client on node1 in another
directory.

The fuse mounted directory is /storage

As per Manual I tried doing the "find <gluster-mount> -noleaf -print0 |
xargs --null stat >/dev/null" dance, however the logs throw a bunch of
errors:
#########################################################################
[2011-09-16 18:29:33.759729] E
[client3_1-fops.c:1216:client3_1_inodelk_cbk] 0-GLSTORAGE-client-0: error
[2011-09-16 18:29:33.759747] I
[client3_1-fops.c:1226:client3_1_inodelk_cbk] 0-GLSTORAGE-client-0:
remote operation failed: Invalid argument
[2011-09-16 18:29:33.759942] E
[afr-self-heal-metadata.c:672:afr_sh_metadata_post_nonblocking_inodelk_cbk]
0-GLSTORAGE-replicate-0: Non Blocking metadata inodelks failed for /.
[2011-09-16 18:29:33.759961] E
[afr-self-heal-metadata.c:674:afr_sh_metadata_post_nonblocking_inodelk_cbk]
0-GLSTORAGE-replicate-0: Metadata self-heal failed for /.
[2011-09-16 18:29:33.760167] W [rpc-common.c:64:xdr_to_generic]
(-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0x7d) [0x7f4702a751ad]
(-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_handle_reply+0xa5)
[0x7f4702a74de5]
(-->/usr/local/lib/glusterfs/3.2.3/xlator/protocol/client.so(client3_1_entrylk_cbk+0x52)
[0x7f46ff88a572]))) 0-xdr: XDR decoding failed
[2011-09-16 18:29:33.760200] E
[client3_1-fops.c:1292:client3_1_entrylk_cbk] 0-GLSTORAGE-client-0: error
[2011-09-16 18:29:33.760215] I
[client3_1-fops.c:1303:client3_1_entrylk_cbk] 0-GLSTORAGE-client-0:
remote operation failed: Invalid argument
[2011-09-16 18:29:33.760417] E
[afr-self-heal-entry.c:2292:afr_sh_post_nonblocking_entry_cbk]
0-GLSTORAGE-replicate-0: Non Blocking entrylks failed for /.
[2011-09-16 18:29:33.760447] E
[afr-self-heal-common.c:1554:afr_self_heal_completion_cbk]
0-GLSTORAGE-replicate-0: background  meta-data entry self-heal failed on /
[2011-09-16 18:29:33.760808] I
[client3_1-fops.c:2228:client3_1_lookup_cbk] 0-GLSTORAGE-client-0:
remote operation failed: Invalid argument
###################################################################################

Is this normal? The directory in question already has 150GB of data, so
the find command is still running. Will it be ok once it finishes?
from what I understand from the manual, the files should repair as the
find process runs, or did I misinterpret that?

If self heal should fail, is there a failsafe method to ensure that both
nodes are in sync again?