Re: [Gluster-users] split-brain sanlock ids automation

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 05/26/2015 08:48 AM, maokx@xxxxxxxx wrote:
Hi all:
I want to solve the problem of ids spit-brain automation.
This problem is caused by network interruption.
My hand is such a solution:
find /data/data1/ -samefile /data/data1/3b9e1af1-f452-4ed4-9cbf-a38d8f7a395e/dom_md/ids -print -delete
The problem is caused by sanlock,for example:sanlock add_lockspace -s lockspace1:1:/dom_md/ids :0
Now I want to automation to solve this problem


You can use the the gluster CLI commands (glusterfs 3.6 onwards) (or) the get/setfattr commands from the mount (glusterfs 3.7 onwards) to heal files that are in split-brain. Usage can be found at https://github.com/gluster/glusterfs/blob/master/doc/features/heal-info-and-split-brain-resolution.md
If you are using ovirt (and hence the sanlock file) with gluster, you can achieve better split-brain protection using replica 3  volumes with cluster.quorum-type set to auto.

Hope that helps,
Ravi

my log is:

[root@www ~]# tail -f /var/log/glusterfs/rhev-data-center-mnt-glusterSD-192.168.7.246\:_pool01.log 

[2015-05-25 06:37:33.613126] W [page.c:991:__ioc_page_error] 0-pool01-io-cache: page error for page = 0x7f4b9801d2d0 & waitq = 0x7f4b9801dd70

[2015-05-25 06:37:33.613157] W [fuse-bridge.c:2089:fuse_readv_cbk] 0-glusterfs-fuse: 246963: READ => -1 (Input/output error)

[2015-05-25 06:37:37.422720] E [afr-self-heal-common.c:197:afr_sh_print_split_brain_log] 0-pool01-replicate-0: Unable to self-heal contents of '/3b9e1af1-f452-4ed4-9cbf-a38d8f7a395e/dom_md/ids' (possible split-brain). Please delete the file from all but the preferred subvolume.- Pending matrix:  [ [ 0 4 ] [ 4 0 ] ]

[2015-05-25 06:37:37.422981] E [afr-self-heal-common.c:2212:afr_self_heal_completion_cbk] 0-pool01-replicate-0: background  data self-heal failed on /3b9e1af1-f452-4ed4-9cbf-a38d8f7a395e/dom_md/ids

[2015-05-25 06:37:37.423231] W [page.c:991:__ioc_page_error] 0-pool01-io-cache: page error for page = 0x7f4b9800c520 & waitq = 0x7f4b9801aff0

[2015-05-25 06:37:37.423259] W [fuse-bridge.c:2089:fuse_readv_cbk] 0-glusterfs-fuse: 246978: READ => -1 (Input/output error)

[2015-05-25 06:37:43.650389] E [afr-self-heal-common.c:197:afr_sh_print_split_brain_log] 0-pool01-replicate-0: Unable to self-heal contents of '/3b9e1af1-f452-4ed4-9cbf-a38d8f7a395e/dom_md/ids' (possible split-brain). Please delete the file from all but the preferred subvolume.- Pending matrix:  [ [ 0 4 ] [ 4 0 ] ]

[2015-05-25 06:37:43.650740] E [afr-self-heal-common.c:2212:afr_self_heal_completion_cbk] 0-pool01-replicate-0: background  data self-heal failed on /3b9e1af1-f452-4ed4-9cbf-a38d8f7a395e/dom_md/ids

[2015-05-25 06:37:43.650994] W [page.c:991:__ioc_page_error] 0-pool01-io-cache: page error for page = 0x7f4b9802adc0 & waitq = 0x7f4b98017e50

[2015-05-25 06:37:43.651021] W [fuse-bridge.c:2089:fuse_readv_cbk] 0-glusterfs-fuse: 246997: READ => -1 (Input/output error)

 

 

[2015-05-25 06:37:50.931622] E [afr-self-heal-common.c:197:afr_sh_print_split_brain_log] 0-pool01-replicate-0: Unable to self-heal contents of '/3b9e1af1-f452-4ed4-9cbf-a38d8f7a395e/dom_md/ids' (possible split-brain). Please delete the file from all but the preferred subvolume.- Pending matrix:  [ [ 0 4 ] [ 4 0 ] ]

[2015-05-25 06:37:50.931906] E [afr-self-heal-common.c:2212:afr_self_heal_completion_cbk] 0-pool01-replicate-0: background  data self-heal failed on /3b9e1af1-f452-4ed4-9cbf-a38d8f7a395e/dom_md/ids

[2015-05-25 06:37:50.932211] W [page.c:991:__ioc_page_error] 0-pool01-io-cache: page error for page = 0x7f4b980211e0 & waitq = 0x7f4b980065e0

[2015-05-25 06:37:50.932240] W [fuse-bridge.c:2089:fuse_readv_cbk] 0-glusterfs-fuse: 247012: READ => -1 (Input/output error)

 

 

[2015-05-25 06:37:53.688445] E [afr-self-heal-common.c:197:afr_sh_print_split_brain_log] 0-pool01-replicate-0: Unable to self-heal contents of '/3b9e1af1-f452-4ed4-9cbf-a38d8f7a395e/dom_md/ids' (possible split-brain). Please delete the file from all but the preferred subvolume.- Pending matrix:  [ [ 0 4 ] [ 4 0 ] ]

[2015-05-25 06:37:53.688821] E [afr-self-heal-common.c:2212:afr_self_heal_completion_cbk] 0-pool01-replicate-0: background  data self-heal failed on /3b9e1af1-f452-4ed4-9cbf-a38d8f7a395e/dom_md/ids

[2015-05-25 06:37:53.689128] W [page.c:991:__ioc_page_error] 0-pool01-io-cache: page error for page = 0x7f4b9800acb0 & waitq = 0x7f4b9802b4d0

[2015-05-25 06:37:53.689152] W [fuse-bridge.c:2089:fuse_readv_cbk] 0-glusterfs-fuse: 247031: READ => -1 (Input/output error)

I am such a environment:





_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users

_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-devel

[Index of Archives]     [Gluster Users]     [Ceph Users]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux