GlusterFS 3.3.1 split-brain rsync question

daemons at kanuka.com.au (Daniel Mons) · Wed, 10 Apr 2013 20:44:39 +1000

Our production GlusterFS 3.3.1GA setup is a 3x2 distribute-replicate,
with 100TB usable for staff.  This is one of 4 identical GlusterFS
clusters we're running.

Very early in the life of our production Gluster rollout, we ran
Netatalk 2.X to share files with MacOSX clients (due to slow negative
lookup on CIFS/Samba for those pesky resource fork files in MacOSX's
Finder).  Netatalk 2.X wrote it's CNID_DB files back to Gluster, which
caused enormous IO, locking up many nodes at a time (lots of "hung
task" errors in dmesg/syslog).

We've since moved to Netatalk 3.X which puts its CNID_DB files
elsewhere (we put them on local SSD RAID), and the lockups have
vanished.  However, our split-brain files number in the tens of
thousands to to those previous lockups, and aren't always predictable
(i.e.: it's not always the case where brick0 is "good" and brick1 is
"bad").  Manually fixing the files is far too time consuming.

I've written a rudimentary script that trawls
/var/log/glusterfs/glustershd.log for split-brain GFIDs, tracks it
down on the matching pair of bricks, and figures out via a few rules
(size tends to be a good indicator for us, as bigger files tend to be
more rencent ones) which is the "good" file.  This works for about 80%
of files, which will dramatically reduce the amount of data we have to
manually check.

My question is: what should I do from here?  Options are:

Option 1) Delete the file from the "bad" brick

Option 2)  rsync the file from the "good" brick to the "bad" brick
with -aX flag (preserve everything, including trusted.afr.$server and
trusted.gfid xattrs)

Option 3) rsync the file from "good" to "bad", and then setfattr -x
trusted.* on the bad brick.

Which of these is considered the better (more glustershd compatible)
option?  Or alternatively, is there something else that's preferred?

Normally I'd just test this on our backup gluster, however as it was
never running Netatalk, it has no split-brain problems, so I can't
test the functionality.

Thanks for any insight provided,

-Dan