Re: Gluster does not seem to detect a split-brain situation

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



(oops... I hate when I reply off-list)

That warning should, imho, be an error. That's saying that the handle, which should be a hardlink to the file, doesn't have a matching inode. It should if it's a hardlink.

If it were me, I would:

    find /export/sdb1/data/.glusterfs -type f -links 1 -print0 | xargs /bin/rm

This would clean up any handles that are not hardlinked where they should be and will allow gluster to repair them.

Btw, the self-heal errors would be in glustershd.log and/or the client mount log(s), not (usually) the brick logs.

On 06/07/2015 12:21 PM, Sjors Gielen wrote:
Oops! Accidentally ran the command as non-root on Curacao, that's why there was no output. The actual output is:

curacao# getfattr -m . -d -e hex /export/sdb1/data/Case/21000355/studies.dat
getfattr: Removing leading '/' from absolute path names
# file: export/sdb1/data/Case/21000355/studies.dat
trusted.afr.data-client-0=0x000000000000000000000000
trusted.afr.data-client-1=0x000000000000000000000000
trusted.afr.dirty=0x000000000000000000000000
trusted.gfid=0xfb34574974cf4804b8b80789738c0f81

For reference, the output on bonaire:

bonaire# getfattr -m . -d -e hex /export/sdb1/data/Case/21000355/studies.dat
getfattr: Removing leading '/' from absolute path names
# file: export/sdb1/data/Case/21000355/studies.dat
trusted.gfid=0xfb34574974cf4804b8b80789738c0f81

Op zo 7 jun. 2015 om 21:13 schreef Sjors Gielen <sjors@xxxxxxxxxxxxxx>:
I'm reading about quorums, I haven't set up anything like that yet.

(In reply to Joe Julian, who responded off-list)

The output of getfattr on bonaire:

bonaire# getfattr -m . -d -e hex /export/sdb1/data/Case/21000355/studies.dat
getfattr: Removing leading '/' from absolute path names
# file: export/sdb1/data/Case/21000355/studies.dat
trusted.gfid=0xfb34574974cf4804b8b80789738c0f81

On curacao, the command gives no output.

From `gluster volume status`, it seems that while the "brick curacao:/export/sdb1/data" is online, it has no associated port number. Curacao can connect to the port number provided by Bonaire just fine. There are no firewalls on/between the two machines, they are on the same subnet connected by Ethernet cables and two switches.

By the way, warning messages just started appearing to /var/log/glusterfs/bricks/export-sdb1-data.log on Bonaire saying "mismatching ino/dev between file X and handle Y", though, maybe only just now even though I started the full self-heal hours ago.

[2015-06-07 19:10:39.624393] W [posix-handle.c:727:posix_handle_hard] 0-data-posix: mismatching ino/dev between file /export/sdb1/data/Archive/S21/21008971/studies.dat (9127104621/2065) and handle /export/sdb1/data/.glusterfs/97/c2/97c2a65d-36e0-4566-a5c1-5925f97af1fd (9190215976/2065)

Thanks again!
Sjors

Op zo 7 jun. 2015 om 19:13 schreef Sjors Gielen <sjors@xxxxxxxxxxxxxx>:
Hi all,

I work at a small, 8-person company that uses Gluster for its primary data storage. We have a volume called "data" that is replicated over two servers (details below). This worked perfectly for over a year, but lately we've been noticing some mismatches between the two bricks, so it seems there has been some split-brain situation that is not being detected or resolved. I have two questions about this:

1) I expected Gluster to (eventually) detect a situation like this; why doesn't it?
2) How do I fix this situation? I've tried an explicit 'heal', but that didn't seem to change anything.

Thanks a lot for your help!
Sjors

------8<------

Volume & peer info: http://pastebin.com/PN7tRXdU
curacao# md5sum /export/sdb1/data/Case/21000355/studies.dat
7bc2daec6be953ffae920d81fe6fa25c
/export/sdb1/data/Case/21000355/studies.dat
bonaire# md5sum /export/sdb1/data/Case/21000355/studies.dat
28c950a1e2a5f33c53a725bf8cd72681 /export/sdb1/data/Case/21000355/studies.dat

# mallorca is one of the clients
mallorca# md5sum /data/Case/21000355/studies.dat
7bc2daec6be953ffae920d81fe6fa25c  /data/Case/21000355/studies.dat

I expected an input/output error after reading this file, because of the split-brain situation, but got none. There are no entries in the GlusterFS logs of either bonaire or curacao.

bonaire# gluster volume heal data full
Launching heal operation to perform full self heal on volume data has been successful
Use heal info commands to check status
bonaire# gluster volume heal data info
Brick bonaire:/export/sdb1/data/
Number of entries: 0

Brick curacao:/export/sdb1/data/
Number of entries: 0

(Same output on curacao, and hours after this, the md5sums on both bricks still differ.)

curacao# gluster --version
glusterfs 3.6.2 built on Mar  2 2015 14:05:34
Repository revision: git://git.gluster.com/glusterfs.git
(Same version on Bonaire)


_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users

[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux