Re: Gluster does not seem to detect a split-brain situation

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Ah, that's really weird. I'm pretty sure that nothing ever made write changes to /export on either machine, so I wonder how the hard links ended up being split. I'll indeed clean up the .glusterfs directory and keep close tabs on Gluster's repair.

Glustershd.log and the client mount logs (data.log and gluster.log at least) on the client are empty and nothing appears when I read the mismatching studies.dat file.

Thanks for your help!
Sjors

Op zo 7 jun. 2015 om 22:10 schreef Joe Julian <joe@xxxxxxxxxxxxxxxx>:
(oops... I hate when I reply off-list)

That warning should, imho, be an error. That's saying that the handle, which should be a hardlink to the file, doesn't have a matching inode. It should if it's a hardlink.

If it were me, I would:

    find /export/sdb1/data/.glusterfs -type f -links 1 -print0 | xargs /bin/rm

This would clean up any handles that are not hardlinked where they should be and will allow gluster to repair them.

Btw, the self-heal errors would be in glustershd.log and/or the client mount log(s), not (usually) the brick logs.


On 06/07/2015 12:21 PM, Sjors Gielen wrote:
Oops! Accidentally ran the command as non-root on Curacao, that's why there was no output. The actual output is:

curacao# getfattr -m . -d -e hex /export/sdb1/data/Case/21000355/studies.dat
getfattr: Removing leading '/' from absolute path names
# file: export/sdb1/data/Case/21000355/studies.dat
trusted.afr.data-client-0=0x000000000000000000000000
trusted.afr.data-client-1=0x000000000000000000000000
trusted.afr.dirty=0x000000000000000000000000
trusted.gfid=0xfb34574974cf4804b8b80789738c0f81

For reference, the output on bonaire:

bonaire# getfattr -m . -d -e hex /export/sdb1/data/Case/21000355/studies.dat
getfattr: Removing leading '/' from absolute path names
# file: export/sdb1/data/Case/21000355/studies.dat
trusted.gfid=0xfb34574974cf4804b8b80789738c0f81

Op zo 7 jun. 2015 om 21:13 schreef Sjors Gielen <sjors@xxxxxxxxxxxxxx>:
I'm reading about quorums, I haven't set up anything like that yet.

(In reply to Joe Julian, who responded off-list)

The output of getfattr on bonaire:

bonaire# getfattr -m . -d -e hex /export/sdb1/data/Case/21000355/studies.dat
getfattr: Removing leading '/' from absolute path names
# file: export/sdb1/data/Case/21000355/studies.dat
trusted.gfid=0xfb34574974cf4804b8b80789738c0f81

On curacao, the command gives no output.

From `gluster volume status`, it seems that while the "brick curacao:/export/sdb1/data" is online, it has no associated port number. Curacao can connect to the port number provided by Bonaire just fine. There are no firewalls on/between the two machines, they are on the same subnet connected by Ethernet cables and two switches.

By the way, warning messages just started appearing to /var/log/glusterfs/bricks/export-sdb1-data.log on Bonaire saying "mismatching ino/dev between file X and handle Y", though, maybe only just now even though I started the full self-heal hours ago.

[2015-06-07 19:10:39.624393] W [posix-handle.c:727:posix_handle_hard] 0-data-posix: mismatching ino/dev between file /export/sdb1/data/Archive/S21/21008971/studies.dat (9127104621/2065) and handle /export/sdb1/data/.glusterfs/97/c2/97c2a65d-36e0-4566-a5c1-5925f97af1fd (9190215976/2065)

Thanks again!
Sjors

Op zo 7 jun. 2015 om 19:13 schreef Sjors Gielen <sjors@xxxxxxxxxxxxxx>:
Hi all,

I work at a small, 8-person company that uses Gluster for its primary data storage. We have a volume called "data" that is replicated over two servers (details below). This worked perfectly for over a year, but lately we've been noticing some mismatches between the two bricks, so it seems there has been some split-brain situation that is not being detected or resolved. I have two questions about this:

1) I expected Gluster to (eventually) detect a situation like this; why doesn't it?
2) How do I fix this situation? I've tried an explicit 'heal', but that didn't seem to change anything.

Thanks a lot for your help!
Sjors

------8<------

Volume & peer info: http://pastebin.com/PN7tRXdU
curacao# md5sum /export/sdb1/data/Case/21000355/studies.dat
7bc2daec6be953ffae920d81fe6fa25c
/export/sdb1/data/Case/21000355/studies.dat
bonaire# md5sum /export/sdb1/data/Case/21000355/studies.dat
28c950a1e2a5f33c53a725bf8cd72681 /export/sdb1/data/Case/21000355/studies.dat

# mallorca is one of the clients
mallorca# md5sum /data/Case/21000355/studies.dat
7bc2daec6be953ffae920d81fe6fa25c  /data/Case/21000355/studies.dat

I expected an input/output error after reading this file, because of the split-brain situation, but got none. There are no entries in the GlusterFS logs of either bonaire or curacao.

bonaire# gluster volume heal data full
Launching heal operation to perform full self heal on volume data has been successful
Use heal info commands to check status
bonaire# gluster volume heal data info
Brick bonaire:/export/sdb1/data/
Number of entries: 0

Brick curacao:/export/sdb1/data/
Number of entries: 0

(Same output on curacao, and hours after this, the md5sums on both bricks still differ.)

curacao# gluster --version
glusterfs 3.6.2 built on Mar  2 2015 14:05:34
Repository revision: git://git.gluster.com/glusterfs.git
(Same version on Bonaire)


_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users

[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux