Hi all,
I work at a small, 8-person company that uses
Gluster for its primary data storage. We have a
volume called "data" that is replicated over two
servers (details below). This worked perfectly for
over a year, but lately we've been noticing some
mismatches between the two bricks, so it seems
there has been some split-brain situation that is
not being detected or resolved. I have two
questions about this:
1) I expected Gluster to (eventually) detect a
situation like this; why doesn't it?
2) How do I fix this situation? I've tried an
explicit 'heal', but that didn't seem to change
anything.
Thanks a lot for your help!
Sjors
------8<------
curacao# md5sum
/export/sdb1/data/Case/21000355/studies.dat
7bc2daec6be953ffae920d81fe6fa25c
/export/sdb1/data/Case/21000355/studies.dat
bonaire# md5sum
/export/sdb1/data/Case/21000355/studies.dat
28c950a1e2a5f33c53a725bf8cd72681
/export/sdb1/data/Case/21000355/studies.dat
# mallorca is one of the clients
mallorca# md5sum
/data/Case/21000355/studies.dat
7bc2daec6be953ffae920d81fe6fa25c
/data/Case/21000355/studies.dat
I expected an input/output error after reading
this file, because of the split-brain situation,
but got none. There are no entries in the
GlusterFS logs of either bonaire or curacao.
bonaire# gluster volume heal data full
Launching heal operation to perform full self
heal on volume data has been successful
Use heal info commands to check status
bonaire# gluster volume heal data info
Brick bonaire:/export/sdb1/data/
Number of entries: 0
Brick curacao:/export/sdb1/data/
Number of entries: 0
(Same output on curacao, and hours after this,
the md5sums on both bricks still differ.)
curacao# gluster --version
glusterfs 3.6.2 built on Mar 2 2015 14:05:34
(Same version on Bonaire)