Hi, I have a glusterfs volume replicated on three nodes. I am planing to use the volume as storage for vMware ESXi machines using NFS. The reason for using tree nodes is to be able to configure Quorum and avoid split-brains. However, during my initial testing when intentionally and gracefully restart the node "ned", a split-brain/self-heal error occurred. The log on "todd" and "rod" gives: [2013-11-29 12:34:14.614456] E [afr-self-heal-data.c:1270:afr_sh_data_open_cbk] 0-gv0-replicate-0: open of <gfid:09b6d1d7-e583-4cee-93a4-4e972346ade3> failed on child gv0-client-2 (No such file or directory) The reason is probably that the file was deleted and recreated with the same file name during the time the node was offline, i.e. new inode and thus new gfid. Is this expected? Is it possible to configure the volume to automatically handle this? The same problem happens every time I test a restart. It looks like Vmware is constantly creating new lock-files for the vSphere-HA directory. Below you will find various information about the glusterfs volume. I have also attached the full logs for all three nodes. [root@todd ~]# gluster volume info Volume Name: gv0 Type: Replicate Volume ID: a847a533-9509-48c5-9c18-a40b48426fbc Status: Started Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: todd-storage:/data/gv0 Brick2: rod-storage:/data/gv0 Brick3: ned-storage:/data/gv0 Options Reconfigured: cluster.server-quorum-type: server cluster.server-quorum-ratio: 51% [root@todd ~]# gluster volume heal gv0 info Gathering Heal info on volume gv0 has been successful Brick todd-storage:/data/gv0 Number of entries: 2 /production-cluster/.vSphere-HA/FDM-DA596AD1-4A6C-4571-A3C8-2114B4FF61EA-5034-b6e1d26-vmware /production-cluster/.vSphere-HA/FDM-DA596AD1-4A6C-4571-A3C8-2114B4FF61EA-5034-b6e1d26-vmware/.lck-5e711126a297a6bb Brick rod-storage:/data/gv0 Number of entries: 2 /production-cluster/.vSphere-HA/FDM-DA596AD1-4A6C-4571-A3C8-2114B4FF61EA-5034-b6e1d26-vmware /production-cluster/.vSphere-HA/FDM-DA596AD1-4A6C-4571-A3C8-2114B4FF61EA-5034-b6e1d26-vmware/.lck-5e711126a297a6bb Brick ned-storage:/data/gv0 Number of entries: 0 [root@todd ~]# getfattr -m . -d -e hex /data/gv0/production-cluster/.vSphere-HA/FDM-DA596AD1-4A6C-4571-A3C8-2114B4FF61EA-5034-b6e1d26-vmware/.lck-5e711126a297a6bb getfattr: Removing leading '/' from absolute path names # file: data/gv0/production-cluster/.vSphere-HA/FDM-DA596AD1-4A6C-4571-A3C8-2114B4FF61EA-5034-b6e1d26-vmware/.lck-5e711126a297a6bb trusted.afr.gv0-client-0=0x000000000000000000000000 trusted.afr.gv0-client-1=0x000000000000000000000000 trusted.afr.gv0-client-2=0x000002810000000100000000 trusted.gfid=0x09b6d1d7e5834cee93a44e972346ade3 [root@todd ~]# stat /data/gv0/production-cluster/.vSphere-HA/FDM-DA596AD1-4A6C-4571-A3C8-2114B4FF61EA-5034-b6e1d26-vmware/.lck-5e711126a297a6bb File: `/data/gv0/production-cluster/.vSphere-HA/FDM-DA596AD1-4A6C-4571-A3C8-2114B4FF61EA-5034-b6e1d26-vmware/.lck-5e711126a297a6bb' Size: 84 Blocks: 8 IO Block: 4096 regular file Device: fd03h/64771d Inode: 1191 Links: 2 Access: (0775/-rwxrwxr-x) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2013-11-29 11:38:36.285091183 +0100 Modify: 2013-11-29 13:26:24.668822831 +0100 Change: 2013-11-29 13:26:24.668822831 +0100 [root@rod ~]# getfattr -m . -d -e hex /data/gv0/production-cluster/.vSphere-HA/FDM-DA596AD1-4A6C-4571-A3C8-2114B4FF61EA-5034-b6e1d26-vmware/.lck-5e711126a297a6bb getfattr: Removing leading '/' from absolute path names # file: data/gv0/production-cluster/.vSphere-HA/FDM-DA596AD1-4A6C-4571-A3C8-2114B4FF61EA-5034-b6e1d26-vmware/.lck-5e711126a297a6bb trusted.afr.gv0-client-0=0x000000000000000000000000 trusted.afr.gv0-client-1=0x000000000000000000000000 trusted.afr.gv0-client-2=0x000002810000000100000000 trusted.gfid=0x09b6d1d7e5834cee93a44e972346ade3 [root@rod ~]# stat /data/gv0/production-cluster/.vSphere-HA/FDM-DA596AD1-4A6C-4571-A3C8-2114B4FF61EA-5034-b6e1d26-vmware/.lck-5e711126a297a6bb File: `/data/gv0/production-cluster/.vSphere-HA/FDM-DA596AD1-4A6C-4571-A3C8-2114B4FF61EA-5034-b6e1d26-vmware/.lck-5e711126a297a6bb' Size: 84 Blocks: 8 IO Block: 4096 regular file Device: fd03h/64771d Inode: 1558 Links: 2 Access: (0775/-rwxrwxr-x) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2013-11-29 11:38:36.284671510 +0100 Modify: 2013-11-29 13:26:24.668985155 +0100 Change: 2013-11-29 13:26:24.669985185 +0100 [root@ned ~]# getfattr -m . -d -e hex /data/gv0/production-cluster/.vSphere-HA/FDM-DA596AD1-4A6C-4571-A3C8-2114B4FF61EA-5034-b6e1d26-vmware/.lck-5e711126a297a6bb getfattr: Removing leading '/' from absolute path names # file: data/gv0/production-cluster/.vSphere-HA/FDM-DA596AD1-4A6C-4571-A3C8-2114B4FF61EA-5034-b6e1d26-vmware/.lck-5e711126a297a6bb trusted.afr.gv0-client-0=0x000000000000000000000000 trusted.afr.gv0-client-1=0x000000000000000000000000 trusted.afr.gv0-client-2=0x000000000000000000000000 trusted.gfid=0x76caf49a25d74ebdb711a562412bee43 [root@ned ~]# stat /data/gv0/production-cluster/.vSphere-HA/FDM-DA596AD1-4A6C-4571-A3C8-2114B4FF61EA-5034-b6e1d26-vmware/.lck-5e711126a297a6bb File: `/data/gv0/production-cluster/.vSphere-HA/FDM-DA596AD1-4A6C-4571-A3C8-2114B4FF61EA-5034-b6e1d26-vmware/.lck-5e711126a297a6bb' Size: 84 Blocks: 8 IO Block: 4096 regular file Device: fd03h/64771d Inode: 4545 Links: 2 Access: (0775/-rwxrwxr-x) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2013-11-29 11:34:45.199330329 +0100 Modify: 2013-11-29 11:37:03.773330311 +0100 Change: 2013-11-29 11:37:03.773330311 +0100 Regards, Marcus Wellhardh
Attachment:
glusterfs-logs.tgz
Description: application/compressed-tar
_______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://supercolony.gluster.org/mailman/listinfo/gluster-users