Output of glfsheal-gv0.log:
[2018-07-04 16:11:05.435680] I [MSGID: 114035] [client-handshake.c:202:client_set_lk_version_cbk] 0-gv0-client-1: Server lk version = 1
[2018-07-04 16:11:05.436847] I [rpc-clnt.c:1986:rpc_clnt_reconfig] 0-gv0-client-2: changing port to 49153 (from 0)
[2018-07-04 16:11:05.437722] W [MSGID: 114007] [client-handshake.c:1190:client_setvolume_cbk] 0-gv0-client-0: failed to find key 'child_up' in the options
[2018-07-04 16:11:05.437744] I [MSGID: 114046] [client-handshake.c:1231:client_setvolume_cbk] 0-gv0-client-0: Connected to gv0-client-0, attached to remote volume '/gluster/brick/brick0'.
[2018-07-04 16:11:05.437755] I [MSGID: 114047] [client-handshake.c:1242:client_setvolume_cbk] 0-gv0-client-0: Server and Client lk-version numbers are not same, reopening the fds
[2018-07-04 16:11:05.531514] I [MSGID: 108002] [afr-common.c:5312:afr_notify] 0-gv0-replicate-0: Client-quorum is met
[2018-07-04 16:11:05.531550] I [MSGID: 114035] [client-handshake.c:202:client_set_lk_version_cbk] 0-gv0-client-0: Server lk version = 1
[2018-07-04 16:11:05.532115] I [MSGID: 114057] [client-handshake.c:1478:select_server_supported_programs] 0-gv0-client-2: Using Program GlusterFS 3.3, Num (1298437), Version (330)
[2018-07-04 16:11:05.537528] I [MSGID: 114046] [client-handshake.c:1231:client_setvolume_cbk] 0-gv0-client-2: Connected to gv0-client-2, attached to remote volume '/gluster/brick/brick0'.
[2018-07-04 16:11:05.537569] I [MSGID: 114047] [client-handshake.c:1242:client_setvolume_cbk] 0-gv0-client-2: Server and Client lk-version numbers are not same, reopening the fds
[2018-07-04 16:11:05.544248] I [MSGID: 114035] [client-handshake.c:202:client_set_lk_version_cbk] 0-gv0-client-2: Server lk version = 1
[2018-07-04 16:11:05.547665] I [MSGID: 108031] [afr-common.c:2458:afr_local_discovery_cbk] 0-gv0-replicate-0: selecting local read_child gv0-client-1
[2018-07-04 16:11:05.556948] W [MSGID: 108027] [afr-common.c:2821:afr_discover_done] 0-gv0-replicate-0: no read subvols for /
[2018-07-04 16:11:05.577751] W [MSGID: 108027] [afr-common.c:2821:afr_discover_done] 0-gv0-replicate-0: no read subvols for /
[2018-07-04 16:11:05.577839] I [MSGID: 104041] [glfs-resolve.c:971:__glfs_active_subvol] 0-gv0: switched to graph 6766732d-766d-3030-312d-37373932362d (0)
[2018-07-04 16:11:05.578355] W [MSGID: 114031] [client-rpc-fops.c:2860:client3_3_lookup_cbk] 0-gv0-client-1: remote operation failed. Path: / (00000000-0000-0000-0000-000000000000) [Invalid argument]
[2018-07-04 16:11:05.579562] W [MSGID: 114031] [client-rpc-fops.c:2860:client3_3_lookup_cbk] 0-gv0-client-0: remote operation failed. Path: / (00000000-0000-0000-0000-000000000000) [Invalid argument]
[2018-07-04 16:11:05.579776] W [MSGID: 114031] [client-rpc-fops.c:2860:client3_3_lookup_cbk] 0-gv0-client-2: remote operation failed. Path: / (00000000-0000-0000-0000-000000000000) [Invalid argument]
Removing the afr xattrs on node 3 did solve the split brain issue on root. Thank you!
On Wed, Jul 4, 2018 at 9:01 AM, Ravishankar N <ravishankar@xxxxxxxxxx> wrote:
On 07/04/2018 09:20 PM, Anh Vo wrote:
I forgot to mention we're using 3.12.10
On Wed, Jul 4, 2018 at 8:45 AM, Anh Vo <vtqanh@xxxxxxxxx> wrote:
If I run "sudo gluster volume heal gv0 split-brain latest-mtime /" I get the following:
Lookup failed on /:Invalid argument.Volume heal failed.
Can you share the glfsheal-<volname>.log on the node where you ran this failed command?
You can manually delete the afr xattrs on node 3 as a workaround:
node2 was not connected at that time, because if we connect it to the system after a few minutes gluster will become almost unusable and we have many jobs failing. This morning I reconnected it and ran heal info and we have about 30000 entries to heal (15K from gfs-vm000 and 15k from gfs-vm001, 80% are all gfid, 20% have file names). It's not feasible for us to check the individual gfid so we kinda rely on gluster self heal to handle those gfid. The "/" is a concern because it prevents us from mounting nfs. We do need to mount nfs for some of our management because gluster fuse mount is much slower compared to nfs when it comes to recursive operations like 'du'
Do you have any suggestion for healing the metadata on '/' ?
setfattr -x trusted.afr.gv0-client-0 gluster/brick/brick0
setfattr -x trusted.afr.gv0-client-1 gluster/brick/brick0
This should remove the split-brain on root.
HTH,
Ravi
ThanksAnh
On Tue, Jul 3, 2018 at 8:02 PM, Ravishankar N <ravishankar@xxxxxxxxxx> wrote:
Hi,
What version of gluster are you using?
1. The afr xattrs on '/' indicate a meta-data split-brain. You can resolve it using one of the policies listed in https://docs.gluster.org/en/la
test/Troubleshooting/resolving -splitbrain/ For example, "
2. Is the file corresponding to the other gfid (81289110-867b-42ff-ba3b-1373agluster volume heal gv0 split-brain latest-mtime / "
187032b) present in all bricks? What do the getfattr outputs for this file indicate?
3. As for the discrepancy in output of heal info, is node2 connected to the other nodes? Does heal info still print the details of all 3 bricks when you run it on node2 ?
-Ravi
On 07/04/2018 01:47 AM, Anh Vo wrote:
Actually we just discovered that the heal info command was returning different things when executed on the different nodes of our 3-replica setup.When we execute it on node2 we did not see the split brain reported "/" but if I execute it on node0 and node1 I am seeing:
x@gfs-vm001:~$ sudo gluster volume heal gv0 info | tee heal-infoBrick gfs-vm000:/gluster/brick/brick0 <gfid:81289110-867b-42ff-ba3b-1373a187032b> / - Is in split-brain
Status: ConnectedNumber of entries: 2
Brick gfs-vm001:/gluster/brick/brick0 / - Is in split-brain
<gfid:81289110-867b-42ff-ba3b-1373a187032b> Status: ConnectedNumber of entries: 2
Brick gfs-vm002:/gluster/brick/brick0 / - Is in split-brain
Status: ConnectedNumber of entries: 1
I ran getfattr -d -m . -e hex /gluster/brick/brick0 on all three nodes and I am seeing node2 has slightly different attr:node0:sudo getfattr -d -m . -e hex /gluster/brick/brick0getfattr: Removing leading '/' from absolute path names# file: gluster/brick/brick0trusted.afr.gv0-client-2=0x000000000000000100000000 trusted.gfid=0x00000000000000000000000000000001 trusted.glusterfs.dht=0x000000010000000000000000ffffffff trusted.glusterfs.volume-id=0x7fa3aac372d543f987ed0c66b77f02 e2
node1:sudo getfattr -d -m . -e hex /gluster/brick/brick0getfattr: Removing leading '/' from absolute path names# file: gluster/brick/brick0trusted.afr.gv0-client-2=0x000000000000000100000000 trusted.gfid=0x00000000000000000000000000000001 trusted.glusterfs.dht=0x000000010000000000000000ffffffff trusted.glusterfs.volume-id=0x7fa3aac372d543f987ed0c66b77f02 e2
node2:sudo getfattr -d -m . -e hex /gluster/brick/brick0getfattr: Removing leading '/' from absolute path names# file: gluster/brick/brick0trusted.afr.dirty=0x000000000000000000000000 trusted.afr.gv0-client-0=0x000000000000000200000000 trusted.afr.gv0-client-1=0x000000000000000200000000 trusted.afr.gv0-client-2=0x000000000000000000000000 trusted.gfid=0x00000000000000000000000000000001 trusted.glusterfs.dht=0x000000010000000000000000ffffffff trusted.glusterfs.volume-id=0x7fa3aac372d543f987ed0c66b77f02 e2
Where do I go from here? Thanks
_______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx https://lists.gluster.org/mailman/listinfo/gluster-users