Re: Self heal problem

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

I did a trivial test to verify my delete/recreate theory:

  1) File exists on all nodes.
  2) One node is powered down.
  3) File is deleted and recreated with same filename.
  4) Failing node is restarted.
  5) Self heal worked on the modified file.

Glusterfs handled that above scenario perfectly. So the question is why
does self heal fail on the vSphere-HA lock file? Does anyone have a
troubleshoot idea?

I am using:

  glusterfs-3.4.1-3.el6.x86_64
  CentOS release 6.4

Regards,
Marcus

On Fri, 2013-11-29 at 14:05 +0100, Marcus Wellhardh wrote: 
> Hi,
> 
> I have a glusterfs volume replicated on three nodes. I am planing to use
> the volume as storage for vMware ESXi machines using NFS. The reason for
> using tree nodes is to be able to configure Quorum and avoid
> split-brains. However, during my initial testing when intentionally and
> gracefully restart the node "ned", a split-brain/self-heal error
> occurred.
> 
> The log on "todd" and "rod" gives:
> 
>   [2013-11-29 12:34:14.614456] E [afr-self-heal-data.c:1270:afr_sh_data_open_cbk] 0-gv0-replicate-0: open of <gfid:09b6d1d7-e583-4cee-93a4-4e972346ade3> failed on child gv0-client-2 (No such file or directory)
> 
> The reason is probably that the file was deleted and recreated with the
> same file name during the time the node was offline, i.e. new inode and
> thus new gfid. 
> 
> Is this expected? Is it possible to configure the volume to
> automatically handle this?
> 
> The same problem happens every time I test a restart. It looks like
> Vmware is constantly creating new lock-files for the vSphere-HA
> directory.
> 
> Below you will find various information about the glusterfs volume. I
> have also attached the full logs for all three nodes. 
> 
> [root@todd ~]# gluster volume info
>  
> Volume Name: gv0
> Type: Replicate
> Volume ID: a847a533-9509-48c5-9c18-a40b48426fbc
> Status: Started
> Number of Bricks: 1 x 3 = 3
> Transport-type: tcp
> Bricks:
> Brick1: todd-storage:/data/gv0
> Brick2: rod-storage:/data/gv0
> Brick3: ned-storage:/data/gv0
> Options Reconfigured:
> cluster.server-quorum-type: server
> cluster.server-quorum-ratio: 51%
> 
> [root@todd ~]# gluster volume heal gv0 info 
> Gathering Heal info on volume gv0 has been successful
> 
> Brick todd-storage:/data/gv0
> Number of entries: 2
> /production-cluster/.vSphere-HA/FDM-DA596AD1-4A6C-4571-A3C8-2114B4FF61EA-5034-b6e1d26-vmware
> /production-cluster/.vSphere-HA/FDM-DA596AD1-4A6C-4571-A3C8-2114B4FF61EA-5034-b6e1d26-vmware/.lck-5e711126a297a6bb
> 
> Brick rod-storage:/data/gv0
> Number of entries: 2
> /production-cluster/.vSphere-HA/FDM-DA596AD1-4A6C-4571-A3C8-2114B4FF61EA-5034-b6e1d26-vmware
> /production-cluster/.vSphere-HA/FDM-DA596AD1-4A6C-4571-A3C8-2114B4FF61EA-5034-b6e1d26-vmware/.lck-5e711126a297a6bb
> 
> Brick ned-storage:/data/gv0
> Number of entries: 0
> 
> [root@todd ~]# getfattr -m . -d -e hex /data/gv0/production-cluster/.vSphere-HA/FDM-DA596AD1-4A6C-4571-A3C8-2114B4FF61EA-5034-b6e1d26-vmware/.lck-5e711126a297a6bb
> getfattr: Removing leading '/' from absolute path names
> # file: data/gv0/production-cluster/.vSphere-HA/FDM-DA596AD1-4A6C-4571-A3C8-2114B4FF61EA-5034-b6e1d26-vmware/.lck-5e711126a297a6bb
> trusted.afr.gv0-client-0=0x000000000000000000000000
> trusted.afr.gv0-client-1=0x000000000000000000000000
> trusted.afr.gv0-client-2=0x000002810000000100000000
> trusted.gfid=0x09b6d1d7e5834cee93a44e972346ade3
> 
> [root@todd ~]# stat /data/gv0/production-cluster/.vSphere-HA/FDM-DA596AD1-4A6C-4571-A3C8-2114B4FF61EA-5034-b6e1d26-vmware/.lck-5e711126a297a6bb
>   File: `/data/gv0/production-cluster/.vSphere-HA/FDM-DA596AD1-4A6C-4571-A3C8-2114B4FF61EA-5034-b6e1d26-vmware/.lck-5e711126a297a6bb'
>   Size: 84        	Blocks: 8          IO Block: 4096   regular file
> Device: fd03h/64771d	Inode: 1191        Links: 2
> Access: (0775/-rwxrwxr-x)  Uid: (    0/    root)   Gid: (    0/    root)
> Access: 2013-11-29 11:38:36.285091183 +0100
> Modify: 2013-11-29 13:26:24.668822831 +0100
> Change: 2013-11-29 13:26:24.668822831 +0100
> 
> [root@rod ~]# getfattr -m . -d -e hex /data/gv0/production-cluster/.vSphere-HA/FDM-DA596AD1-4A6C-4571-A3C8-2114B4FF61EA-5034-b6e1d26-vmware/.lck-5e711126a297a6bb
> getfattr: Removing leading '/' from absolute path names
> # file: data/gv0/production-cluster/.vSphere-HA/FDM-DA596AD1-4A6C-4571-A3C8-2114B4FF61EA-5034-b6e1d26-vmware/.lck-5e711126a297a6bb
> trusted.afr.gv0-client-0=0x000000000000000000000000
> trusted.afr.gv0-client-1=0x000000000000000000000000
> trusted.afr.gv0-client-2=0x000002810000000100000000
> trusted.gfid=0x09b6d1d7e5834cee93a44e972346ade3
> 
> [root@rod ~]# stat /data/gv0/production-cluster/.vSphere-HA/FDM-DA596AD1-4A6C-4571-A3C8-2114B4FF61EA-5034-b6e1d26-vmware/.lck-5e711126a297a6bb
>   File: `/data/gv0/production-cluster/.vSphere-HA/FDM-DA596AD1-4A6C-4571-A3C8-2114B4FF61EA-5034-b6e1d26-vmware/.lck-5e711126a297a6bb'
>   Size: 84        	Blocks: 8          IO Block: 4096   regular file
> Device: fd03h/64771d	Inode: 1558        Links: 2
> Access: (0775/-rwxrwxr-x)  Uid: (    0/    root)   Gid: (    0/    root)
> Access: 2013-11-29 11:38:36.284671510 +0100
> Modify: 2013-11-29 13:26:24.668985155 +0100
> Change: 2013-11-29 13:26:24.669985185 +0100
> 
> [root@ned ~]# getfattr -m . -d -e hex /data/gv0/production-cluster/.vSphere-HA/FDM-DA596AD1-4A6C-4571-A3C8-2114B4FF61EA-5034-b6e1d26-vmware/.lck-5e711126a297a6bb
> getfattr: Removing leading '/' from absolute path names
> # file: data/gv0/production-cluster/.vSphere-HA/FDM-DA596AD1-4A6C-4571-A3C8-2114B4FF61EA-5034-b6e1d26-vmware/.lck-5e711126a297a6bb
> trusted.afr.gv0-client-0=0x000000000000000000000000
> trusted.afr.gv0-client-1=0x000000000000000000000000
> trusted.afr.gv0-client-2=0x000000000000000000000000
> trusted.gfid=0x76caf49a25d74ebdb711a562412bee43
> 
> [root@ned ~]# stat /data/gv0/production-cluster/.vSphere-HA/FDM-DA596AD1-4A6C-4571-A3C8-2114B4FF61EA-5034-b6e1d26-vmware/.lck-5e711126a297a6bb
>   File: `/data/gv0/production-cluster/.vSphere-HA/FDM-DA596AD1-4A6C-4571-A3C8-2114B4FF61EA-5034-b6e1d26-vmware/.lck-5e711126a297a6bb'
>   Size: 84        	Blocks: 8          IO Block: 4096   regular file
> Device: fd03h/64771d	Inode: 4545        Links: 2
> Access: (0775/-rwxrwxr-x)  Uid: (    0/    root)   Gid: (    0/    root)
> Access: 2013-11-29 11:34:45.199330329 +0100
> Modify: 2013-11-29 11:37:03.773330311 +0100
> Change: 2013-11-29 11:37:03.773330311 +0100
> 
> Regards,
> Marcus Wellhardh
> _______________________________________________
> Gluster-users mailing list
> Gluster-users@xxxxxxxxxxx
> http://supercolony.gluster.org/mailman/listinfo/gluster-users


_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://supercolony.gluster.org/mailman/listinfo/gluster-users




[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux