Self heal problem

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

I have a glusterfs volume replicated on three nodes. I am planing to use
the volume as storage for vMware ESXi machines using NFS. The reason for
using tree nodes is to be able to configure Quorum and avoid
split-brains. However, during my initial testing when intentionally and
gracefully restart the node "ned", a split-brain/self-heal error
occurred.

The log on "todd" and "rod" gives:

  [2013-11-29 12:34:14.614456] E [afr-self-heal-data.c:1270:afr_sh_data_open_cbk] 0-gv0-replicate-0: open of <gfid:09b6d1d7-e583-4cee-93a4-4e972346ade3> failed on child gv0-client-2 (No such file or directory)

The reason is probably that the file was deleted and recreated with the
same file name during the time the node was offline, i.e. new inode and
thus new gfid. 

Is this expected? Is it possible to configure the volume to
automatically handle this?

The same problem happens every time I test a restart. It looks like
Vmware is constantly creating new lock-files for the vSphere-HA
directory.

Below you will find various information about the glusterfs volume. I
have also attached the full logs for all three nodes. 

[root@todd ~]# gluster volume info
 
Volume Name: gv0
Type: Replicate
Volume ID: a847a533-9509-48c5-9c18-a40b48426fbc
Status: Started
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: todd-storage:/data/gv0
Brick2: rod-storage:/data/gv0
Brick3: ned-storage:/data/gv0
Options Reconfigured:
cluster.server-quorum-type: server
cluster.server-quorum-ratio: 51%

[root@todd ~]# gluster volume heal gv0 info 
Gathering Heal info on volume gv0 has been successful

Brick todd-storage:/data/gv0
Number of entries: 2
/production-cluster/.vSphere-HA/FDM-DA596AD1-4A6C-4571-A3C8-2114B4FF61EA-5034-b6e1d26-vmware
/production-cluster/.vSphere-HA/FDM-DA596AD1-4A6C-4571-A3C8-2114B4FF61EA-5034-b6e1d26-vmware/.lck-5e711126a297a6bb

Brick rod-storage:/data/gv0
Number of entries: 2
/production-cluster/.vSphere-HA/FDM-DA596AD1-4A6C-4571-A3C8-2114B4FF61EA-5034-b6e1d26-vmware
/production-cluster/.vSphere-HA/FDM-DA596AD1-4A6C-4571-A3C8-2114B4FF61EA-5034-b6e1d26-vmware/.lck-5e711126a297a6bb

Brick ned-storage:/data/gv0
Number of entries: 0

[root@todd ~]# getfattr -m . -d -e hex /data/gv0/production-cluster/.vSphere-HA/FDM-DA596AD1-4A6C-4571-A3C8-2114B4FF61EA-5034-b6e1d26-vmware/.lck-5e711126a297a6bb
getfattr: Removing leading '/' from absolute path names
# file: data/gv0/production-cluster/.vSphere-HA/FDM-DA596AD1-4A6C-4571-A3C8-2114B4FF61EA-5034-b6e1d26-vmware/.lck-5e711126a297a6bb
trusted.afr.gv0-client-0=0x000000000000000000000000
trusted.afr.gv0-client-1=0x000000000000000000000000
trusted.afr.gv0-client-2=0x000002810000000100000000
trusted.gfid=0x09b6d1d7e5834cee93a44e972346ade3

[root@todd ~]# stat /data/gv0/production-cluster/.vSphere-HA/FDM-DA596AD1-4A6C-4571-A3C8-2114B4FF61EA-5034-b6e1d26-vmware/.lck-5e711126a297a6bb
  File: `/data/gv0/production-cluster/.vSphere-HA/FDM-DA596AD1-4A6C-4571-A3C8-2114B4FF61EA-5034-b6e1d26-vmware/.lck-5e711126a297a6bb'
  Size: 84        	Blocks: 8          IO Block: 4096   regular file
Device: fd03h/64771d	Inode: 1191        Links: 2
Access: (0775/-rwxrwxr-x)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2013-11-29 11:38:36.285091183 +0100
Modify: 2013-11-29 13:26:24.668822831 +0100
Change: 2013-11-29 13:26:24.668822831 +0100

[root@rod ~]# getfattr -m . -d -e hex /data/gv0/production-cluster/.vSphere-HA/FDM-DA596AD1-4A6C-4571-A3C8-2114B4FF61EA-5034-b6e1d26-vmware/.lck-5e711126a297a6bb
getfattr: Removing leading '/' from absolute path names
# file: data/gv0/production-cluster/.vSphere-HA/FDM-DA596AD1-4A6C-4571-A3C8-2114B4FF61EA-5034-b6e1d26-vmware/.lck-5e711126a297a6bb
trusted.afr.gv0-client-0=0x000000000000000000000000
trusted.afr.gv0-client-1=0x000000000000000000000000
trusted.afr.gv0-client-2=0x000002810000000100000000
trusted.gfid=0x09b6d1d7e5834cee93a44e972346ade3

[root@rod ~]# stat /data/gv0/production-cluster/.vSphere-HA/FDM-DA596AD1-4A6C-4571-A3C8-2114B4FF61EA-5034-b6e1d26-vmware/.lck-5e711126a297a6bb
  File: `/data/gv0/production-cluster/.vSphere-HA/FDM-DA596AD1-4A6C-4571-A3C8-2114B4FF61EA-5034-b6e1d26-vmware/.lck-5e711126a297a6bb'
  Size: 84        	Blocks: 8          IO Block: 4096   regular file
Device: fd03h/64771d	Inode: 1558        Links: 2
Access: (0775/-rwxrwxr-x)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2013-11-29 11:38:36.284671510 +0100
Modify: 2013-11-29 13:26:24.668985155 +0100
Change: 2013-11-29 13:26:24.669985185 +0100

[root@ned ~]# getfattr -m . -d -e hex /data/gv0/production-cluster/.vSphere-HA/FDM-DA596AD1-4A6C-4571-A3C8-2114B4FF61EA-5034-b6e1d26-vmware/.lck-5e711126a297a6bb
getfattr: Removing leading '/' from absolute path names
# file: data/gv0/production-cluster/.vSphere-HA/FDM-DA596AD1-4A6C-4571-A3C8-2114B4FF61EA-5034-b6e1d26-vmware/.lck-5e711126a297a6bb
trusted.afr.gv0-client-0=0x000000000000000000000000
trusted.afr.gv0-client-1=0x000000000000000000000000
trusted.afr.gv0-client-2=0x000000000000000000000000
trusted.gfid=0x76caf49a25d74ebdb711a562412bee43

[root@ned ~]# stat /data/gv0/production-cluster/.vSphere-HA/FDM-DA596AD1-4A6C-4571-A3C8-2114B4FF61EA-5034-b6e1d26-vmware/.lck-5e711126a297a6bb
  File: `/data/gv0/production-cluster/.vSphere-HA/FDM-DA596AD1-4A6C-4571-A3C8-2114B4FF61EA-5034-b6e1d26-vmware/.lck-5e711126a297a6bb'
  Size: 84        	Blocks: 8          IO Block: 4096   regular file
Device: fd03h/64771d	Inode: 4545        Links: 2
Access: (0775/-rwxrwxr-x)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2013-11-29 11:34:45.199330329 +0100
Modify: 2013-11-29 11:37:03.773330311 +0100
Change: 2013-11-29 11:37:03.773330311 +0100

Regards,
Marcus Wellhardh

Attachment: glusterfs-logs.tgz
Description: application/compressed-tar

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://supercolony.gluster.org/mailman/listinfo/gluster-users

[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux