Ah, that's really weird. I'm pretty sure that nothing ever made write changes to /export on either machine, so I wonder how the hard links ended up being split. I'll indeed clean up the .glusterfs directory and keep close tabs on Gluster's repair.
Glustershd.log and the client mount logs (data.log and gluster.log at least) on the client are empty and nothing appears when I read the mismatching studies.dat file.
Thanks for your help!
Sjors
Op zo 7 jun. 2015 om 22:10 schreef Joe Julian <joe@xxxxxxxxxxxxxxxx>:
(oops... I hate when I reply off-list)
That warning should, imho, be an error. That's saying that the handle, which should be a hardlink to the file, doesn't have a matching inode. It should if it's a hardlink.
If it were me, I would:
find /export/sdb1/data/.glusterfs -type f -links 1 -print0 | xargs /bin/rm
This would clean up any handles that are not hardlinked where they should be and will allow gluster to repair them.
Btw, the self-heal errors would be in glustershd.log and/or the client mount log(s), not (usually) the brick logs.
On 06/07/2015 12:21 PM, Sjors Gielen wrote:
Oops! Accidentally ran the command as non-root on Curacao, that's why there was no output. The actual output is:
curacao# getfattr -m . -d -e hex /export/sdb1/data/Case/21000355/studies.datgetfattr: Removing leading '/' from absolute path names# file: export/sdb1/data/Case/21000355/studies.dattrusted.afr.data-client-0=0x000000000000000000000000trusted.afr.data-client-1=0x000000000000000000000000trusted.afr.dirty=0x000000000000000000000000trusted.gfid=0xfb34574974cf4804b8b80789738c0f81
For reference, the output on bonaire:
bonaire# getfattr -m . -d -e hex /export/sdb1/data/Case/21000355/studies.datgetfattr: Removing leading '/' from absolute path names# file: export/sdb1/data/Case/21000355/studies.dattrusted.gfid=0xfb34574974cf4804b8b80789738c0f81
Op zo 7 jun. 2015 om 21:13 schreef Sjors Gielen <sjors@xxxxxxxxxxxxxx>:
I'm reading about quorums, I haven't set up anything like that yet.
(In reply to Joe Julian, who responded off-list)The output of getfattr on bonaire:
bonaire# getfattr -m . -d -e hex /export/sdb1/data/Case/21000355/studies.datgetfattr: Removing leading '/' from absolute path names# file: export/sdb1/data/Case/21000355/studies.dattrusted.gfid=0xfb34574974cf4804b8b80789738c0f81
On curacao, the command gives no output.
From `gluster volume status`, it seems that while the "brick curacao:/export/sdb1/data" is online, it has no associated port number. Curacao can connect to the port number provided by Bonaire just fine. There are no firewalls on/between the two machines, they are on the same subnet connected by Ethernet cables and two switches.
By the way, warning messages just started appearing to /var/log/glusterfs/bricks/export-sdb1-data.log on Bonaire saying "mismatching ino/dev between file X and handle Y", though, maybe only just now even though I started the full self-heal hours ago.
[2015-06-07 19:10:39.624393] W [posix-handle.c:727:posix_handle_hard] 0-data-posix: mismatching ino/dev between file /export/sdb1/data/Archive/S21/21008971/studies.dat (9127104621/2065) and handle /export/sdb1/data/.glusterfs/97/c2/97c2a65d-36e0-4566-a5c1-5925f97af1fd (9190215976/2065)
Thanks again!Sjors
Op zo 7 jun. 2015 om 19:13 schreef Sjors Gielen <sjors@xxxxxxxxxxxxxx>:
Hi all,
I work at a small, 8-person company that uses Gluster for its primary data storage. We have a volume called "data" that is replicated over two servers (details below). This worked perfectly for over a year, but lately we've been noticing some mismatches between the two bricks, so it seems there has been some split-brain situation that is not being detected or resolved. I have two questions about this:
1) I expected Gluster to (eventually) detect a situation like this; why doesn't it?2) How do I fix this situation? I've tried an explicit 'heal', but that didn't seem to change anything.
Thanks a lot for your help!Sjors
------8<------
Volume & peer info: http://pastebin.com/PN7tRXdUcuracao# md5sum /export/sdb1/data/Case/21000355/studies.dat7bc2daec6be953ffae920d81fe6fa25c/export/sdb1/data/Case/21000355/studies.dat
bonaire# md5sum /export/sdb1/data/Case/21000355/studies.dat28c950a1e2a5f33c53a725bf8cd72681 /export/sdb1/data/Case/21000355/studies.dat
# mallorca is one of the clientsmallorca# md5sum /data/Case/21000355/studies.dat7bc2daec6be953ffae920d81fe6fa25c /data/Case/21000355/studies.dat
I expected an input/output error after reading this file, because of the split-brain situation, but got none. There are no entries in the GlusterFS logs of either bonaire or curacao.
bonaire# gluster volume heal data fullLaunching heal operation to perform full self heal on volume data has been successfulUse heal info commands to check statusbonaire# gluster volume heal data infoBrick bonaire:/export/sdb1/data/Number of entries: 0
Brick curacao:/export/sdb1/data/Number of entries: 0
(Same output on curacao, and hours after this, the md5sums on both bricks still differ.)
curacao# gluster --versionglusterfs 3.6.2 built on Mar 2 2015 14:05:34Repository revision: git://git.gluster.com/glusterfs.git(Same version on Bonaire)
______________________________________________________________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://www.gluster.org/mailman/listinfo/gluster-users
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users
_______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://www.gluster.org/mailman/listinfo/gluster-users