GlusterFS 3.3.1 split-brain rsync question

daemons at kanuka.com.au (Daniel Mons) · Thu, 11 Apr 2013 22:41:19 +1000

Hi Pete,

Thanks for that link.  I'm going to try this en mass on an unimportant
directory over the weekend.

-Dan

On 11 April 2013 01:41, Pete Smith <pete at realisestudio.com> wrote:
> Hi Dan
>
> I've come up against this recently whilst trying to delete large amounts of
> files from our cluster.
>
> I'm resolving it with the method from
> http://comments.gmane.org/gmane.comp.file-systems.gluster.user/1917
>
> With Fabric as a helping hand, it's not too tedious.
>
> Not sure about the level of glustershd compatibiity, but it's working for
> me.
>
> HTH
>
> Pete
> --
>
>
> On 10 April 2013 11:44, Daniel Mons <daemons at kanuka.com.au> wrote:
>>
>> Our production GlusterFS 3.3.1GA setup is a 3x2 distribute-replicate,
>> with 100TB usable for staff.  This is one of 4 identical GlusterFS
>> clusters we're running.
>>
>> Very early in the life of our production Gluster rollout, we ran
>> Netatalk 2.X to share files with MacOSX clients (due to slow negative
>> lookup on CIFS/Samba for those pesky resource fork files in MacOSX's
>> Finder).  Netatalk 2.X wrote it's CNID_DB files back to Gluster, which
>> caused enormous IO, locking up many nodes at a time (lots of "hung
>> task" errors in dmesg/syslog).
>>
>> We've since moved to Netatalk 3.X which puts its CNID_DB files
>> elsewhere (we put them on local SSD RAID), and the lockups have
>> vanished.  However, our split-brain files number in the tens of
>> thousands to to those previous lockups, and aren't always predictable
>> (i.e.: it's not always the case where brick0 is "good" and brick1 is
>> "bad").  Manually fixing the files is far too time consuming.
>>
>> I've written a rudimentary script that trawls
>> /var/log/glusterfs/glustershd.log for split-brain GFIDs, tracks it
>> down on the matching pair of bricks, and figures out via a few rules
>> (size tends to be a good indicator for us, as bigger files tend to be
>> more rencent ones) which is the "good" file.  This works for about 80%
>> of files, which will dramatically reduce the amount of data we have to
>> manually check.
>>
>> My question is: what should I do from here?  Options are:
>>
>> Option 1) Delete the file from the "bad" brick
>>
>> Option 2)  rsync the file from the "good" brick to the "bad" brick
>> with -aX flag (preserve everything, including trusted.afr.$server and
>> trusted.gfid xattrs)
>>
>> Option 3) rsync the file from "good" to "bad", and then setfattr -x
>> trusted.* on the bad brick.
>>
>> Which of these is considered the better (more glustershd compatible)
>> option?  Or alternatively, is there something else that's preferred?
>>
>> Normally I'd just test this on our backup gluster, however as it was
>> never running Netatalk, it has no split-brain problems, so I can't
>> test the functionality.
>>
>> Thanks for any insight provided,
>>
>> -Dan
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>
>
>
>
> --
> Pete Smith
> DevOp/System Administrator
> Realise Studio
> 12/13 Poland Street, London W1F 8QB
> T. +44 (0)20 7165 9644
>
> realisestudio.com