When i check the file sizes and sha1sum on the bricks, 2 of
the 3 bricks have the same value. So by quorum logic the first
brick should have healed with this information. But i don't
see that happening. Can someone please tell me if this is
expected behavior?
Can someone please tell me if i have things
misconfigured...
Ramesh
My config is as below.
[root@ip-172-31-12-218 ~]# gluster volume info
Volume Name: PL1
Type: Replicate
Volume ID: a7aabae0-c6bc-40a9-8b26-0498d488ee39
Status: Started
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: 172.31.38.189:/data/vol1/gluster-data
Brick2: 172.31.16.220:/data/vol1/gluster-data
Brick3: 172.31.12.218:/data/vol1/gluster-data
Options Reconfigured:
performance.cache-size: 2147483648
nfs.addr-namelookup: off
network.ping-timeout: 12
cluster.server-quorum-type: server
nfs.enable-ino32: on
cluster.quorum-type: auto
cluster.server-quorum-ratio: 51%
Volume Name: PL2
Type: Replicate
Volume ID: fadb3671-7a92-40b7-bccd-fbacf672f6dc
Status: Started
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: 172.31.38.189:/data/vol2/gluster-data
Brick2: 172.31.16.220:/data/vol2/gluster-data
Brick3: 172.31.12.218:/data/vol2/gluster-data
Options Reconfigured:
performance.cache-size: 2147483648
nfs.addr-namelookup: off
network.ping-timeout: 12
cluster.server-quorum-type: server
nfs.enable-ino32: on
cluster.quorum-type: auto
cluster.server-quorum-ratio: 51%
[root@ip-172-31-12-218 ~]#
I have 2 clients each mounting one of the volumes. At
no time the same volume is mounted by more than 1
client.
mount -t glusterfs -o
defaults,enable-ino32,direct-io-mode=disable,log-level=WARNING,log-file=/var/log/gluster.log,backupvolfile-server=172.31.38.189,backupvolfile-server=172.31.12.218,background-qlen=256
172.31.16.220:/PL2 /mnt/vm
I restarted the Brick 1 172.31.38.189 and when it came
up, one of the file on PL2 volume went into split mode..
[2014-09-05 17:59:42.997308] W
[afr-open.c:209:afr_open] 0-PL2-replicate-0: failed to
open as split brain seen, returning EIO
[2014-09-05 17:59:42.997350] W
[fuse-bridge.c:2209:fuse_writev_cbk] 0-glusterfs-fuse:
3359683: WRITE => -1 (Input/output error)
[2014-09-05 17:59:42.997476] W
[fuse-bridge.c:690:fuse_truncate_cbk] 0-glusterfs-fuse:
3359684: FTRUNCATE() ERR => -1 (Input/
output error)[2014-09-05 17:59:42.997647] W
[fuse-bridge.c:2209:fuse_writev_cbk] 0-glusterfs-fuse:
3359686: WRITE => -1 (Input/output erro
r)[2014-09-05 17:59:42.997783] W
[fuse-bridge.c:1214:fuse_err_cbk] 0-glusterfs-fuse:
3359687: FLUSH() ERR => -1 (Input/output e
rror)[2014-09-05 17:59:44.009187] E
[afr-self-heal-common.c:233:afr_sh_print_split_brain_log]
0-PL2-replicate-0: Unable to self-he
al contents of
'/apache_cp_mm1/logs/access_log.2014-09-05-17_00_00'
(possible split-brain). Please delete the file from all
but the preferred subvolume.- Pending matrix: [ [ 0 1 1
] [ 3398 0 0 ] [ 3398 0 0 ] ]
[2014-09-05 17:59:44.011116] E
[afr-self-heal-common.c:2868:afr_log_self_heal_completion_status]
0-PL2-replicate-0: backgroung data self heal failed,
on /apache_cp_mm1/logs/access_log.2014-09-05-17_00_00
[2014-09-05 17:59:44.011480] W
[afr-open.c:209:afr_open] 0-PL2-replicate-0: failed to
open as split brain seen, returning EIO
Starting time of crawl: Fri Sep 5 17:55:32 2014
Ending time of crawl: Fri Sep 5 17:55:33 2014
Type of crawl: INDEX
No. of entries healed: 4
No. of entries in split-brain: 1
No. of heal failed entries: 0
[root@ip-172-31-16-220 ~]# gluster volume heal PL2
info
Brick ip-172-31-38-189:/data/vol2/gluster-data/
/apache_cp_mm1/logs/mm1.access_log.2014-09-05-17_00_00
Number of entries: 1
Brick ip-172-31-16-220:/data/vol2/gluster-data/
/apache_cp_mm1/logs/mm1.access_log.2014-09-05-17_00_00
Number of entries: 1
Brick ip-172-31-12-218:/data/vol2/gluster-data/
/apache_cp_mm1/logs/mm1.access_log.2014-09-05-17_00_00
Number of entries: 1
BRICK1
========
[root@ip-172-31-38-189 ~]# sha1sum
access_log.2014-09-05-17_00_00
aa72d0f3949700f67b61d3c58fdbc75b772d607b
access_log.2014-09-05-17_00_00
[root@ip-172-31-38-189 ~]# ls -al
total 12760
dr-xr-x--- 3 root root 4096 Sep 5 17:42
.
dr-xr-xr-x 24 root root 4096 Sep 5 17:34
..
-rw-r----- 1 root root 13019808 Sep 5 17:42
access_log.2014-09-05-17_00_00
[root@ip-172-31-38-189 ~]# getfattr -d -m . -e hex
/data/vol2/gluster-data/apache_cp_mm1/logs/access_log.2014-09-05-17_00_00
getfattr: Removing leading '/' from absolute path
names
# file:
data/vol2/gluster-data/apache_cp_mm1/logs/access_log.2014-09-05-17_00_00
trusted.afr.PL2-client-0=0x000000000000000000000000
trusted.afr.PL2-client-1=0x000000010000000000000000
trusted.afr.PL2-client-2=0x000000010000000000000000
trusted.gfid=0xea950263977e46bf89a0ef631ca139c2
BRICK 2
=======
[root@ip-172-31-16-220 ~]# sha1sum
access_log.2014-09-05-17_00_00
0f7b72f77a792b5c2b68456c906cf7b93287f0d6
access_log.2014-09-05-17_00_00
[root@ip-172-31-16-220 ~]# getfattr -d -m . -e hex
/data/vol2/gluster-data/apache_cp_mm1/logs/access_log.2014-09-05-17_00_00
getfattr: Removing leading '/' from absolute path
names
# file:
data/vol2/gluster-data/apache_cp_mm1/logs/access_log.2014-09-05-17_00_00
trusted.afr.PL2-client-0=0x00000d460000000000000000
trusted.afr.PL2-client-1=0x000000000000000000000000
trusted.afr.PL2-client-2=0x000000000000000000000000
trusted.gfid=0xea950263977e46bf89a0ef631ca139c2
BRICK 3
=========
[root@ip-172-31-12-218 ~]# sha1sum
access_log.2014-09-05-17_00_00
0f7b72f77a792b5c2b68456c906cf7b93287f0d6
access_log.2014-09-05-17_00_00
[root@ip-172-31-12-218 ~]# getfattr -d -m . -e hex
/data/vol2/gluster-data/apache_cp_mm1/logs/access_log.2014-09-05-17_00_00
getfattr: Removing leading '/' from absolute path
names
# file:
data/vol2/gluster-data/apache_cp_mm1/logs/access_log.2014-09-05-17_00_00
trusted.afr.PL2-client-0=0x00000d460000000000000000
trusted.afr.PL2-client-1=0x000000000000000000000000
trusted.afr.PL2-client-2=0x000000000000000000000000
trusted.gfid=0xea950263977e46bf89a0ef631ca139c2