GlusterFS 3.3.1 split-brain rsync question

pete at realisestudio.com (Pete Smith) · Mon, 15 Apr 2013 09:03:40 +0100

No problem. Glad to be of help.

On 14 April 2013 05:31, Daniel Mons <daemons at kanuka.com.au> wrote:

> Running the following script on an unimportant tree in the cluster
> this weekend as a test.  So far, so good, and it appears to be doing
> what I want.
>
> Thanks again Pete for the recommendation.
>
> -Dan
>
>
> #!/bin/bash
> BR='-------------------------'
> UUID=$(/usr/bin/uuidgen)
> if [ "$UUID" == "" ]
>         then
>         echo "UUID is null"
>         exit 1
> fi
> find "/mnt/blah/" -type f | while read FILE
> do
>         DNAME=$(dirname "${FILE}")
>         FNAME=$(basename "${FILE}")
>         cd "${DNAME}"
>         if (( $? > 0 ))
>         then
>                 echo "Bad cd operation"
>                 exit 1
>         fi
>         pwd
>         mv -v "${FNAME}" "${FNAME}.${UUID}"
>         if (( $? > 0 ))
>         then
>                 echo "Bad mv operation"
>                 exit 1
>         fi
>         cp -pv "${FNAME}.${UUID}" "${FNAME}"
>         if (( $? > 0 ))
>         then
>                 echo "Bad cp operation"
>                 exit 1
>         fi
>         rm -fv "${FNAME}.${UUID}"
>         if (( $? > 0 ))
>         then
>                 echo "Bad rm operation"
>                 exit 1
>         fi
>         echo "${BR}"
> done
>
>
> On 11 April 2013 22:41, Daniel Mons <daemons at kanuka.com.au> wrote:
> > Hi Pete,
> >
> > Thanks for that link.  I'm going to try this en mass on an unimportant
> > directory over the weekend.
> >
> > -Dan
> >
> >
> > On 11 April 2013 01:41, Pete Smith <pete at realisestudio.com> wrote:
> >> Hi Dan
> >>
> >> I've come up against this recently whilst trying to delete large
> amounts of
> >> files from our cluster.
> >>
> >> I'm resolving it with the method from
> >> http://comments.gmane.org/gmane.comp.file-systems.gluster.user/1917
> >>
> >> With Fabric as a helping hand, it's not too tedious.
> >>
> >> Not sure about the level of glustershd compatibiity, but it's working
> for
> >> me.
> >>
> >> HTH
> >>
> >> Pete
> >> --
> >>
> >>
> >> On 10 April 2013 11:44, Daniel Mons <daemons at kanuka.com.au> wrote:
> >>>
> >>> Our production GlusterFS 3.3.1GA setup is a 3x2 distribute-replicate,
> >>> with 100TB usable for staff.  This is one of 4 identical GlusterFS
> >>> clusters we're running.
> >>>
> >>> Very early in the life of our production Gluster rollout, we ran
> >>> Netatalk 2.X to share files with MacOSX clients (due to slow negative
> >>> lookup on CIFS/Samba for those pesky resource fork files in MacOSX's
> >>> Finder).  Netatalk 2.X wrote it's CNID_DB files back to Gluster, which
> >>> caused enormous IO, locking up many nodes at a time (lots of "hung
> >>> task" errors in dmesg/syslog).
> >>>
> >>> We've since moved to Netatalk 3.X which puts its CNID_DB files
> >>> elsewhere (we put them on local SSD RAID), and the lockups have
> >>> vanished.  However, our split-brain files number in the tens of
> >>> thousands to to those previous lockups, and aren't always predictable
> >>> (i.e.: it's not always the case where brick0 is "good" and brick1 is
> >>> "bad").  Manually fixing the files is far too time consuming.
> >>>
> >>> I've written a rudimentary script that trawls
> >>> /var/log/glusterfs/glustershd.log for split-brain GFIDs, tracks it
> >>> down on the matching pair of bricks, and figures out via a few rules
> >>> (size tends to be a good indicator for us, as bigger files tend to be
> >>> more rencent ones) which is the "good" file.  This works for about 80%
> >>> of files, which will dramatically reduce the amount of data we have to
> >>> manually check.
> >>>
> >>> My question is: what should I do from here?  Options are:
> >>>
> >>> Option 1) Delete the file from the "bad" brick
> >>>
> >>> Option 2)  rsync the file from the "good" brick to the "bad" brick
> >>> with -aX flag (preserve everything, including trusted.afr.$server and
> >>> trusted.gfid xattrs)
> >>>
> >>> Option 3) rsync the file from "good" to "bad", and then setfattr -x
> >>> trusted.* on the bad brick.
> >>>
> >>> Which of these is considered the better (more glustershd compatible)
> >>> option?  Or alternatively, is there something else that's preferred?
> >>>
> >>> Normally I'd just test this on our backup gluster, however as it was
> >>> never running Netatalk, it has no split-brain problems, so I can't
> >>> test the functionality.
> >>>
> >>> Thanks for any insight provided,
> >>>
> >>> -Dan
> >>> _______________________________________________
> >>> Gluster-users mailing list
> >>> Gluster-users at gluster.org
> >>> http://supercolony.gluster.org/mailman/listinfo/gluster-users
> >>
> >>
> >>
> >>
> >> --
> >> Pete Smith
> >> DevOp/System Administrator
> >> Realise Studio
> >> 12/13 Poland Street, London W1F 8QB
> >> T. +44 (0)20 7165 9644
> >>
> >> realisestudio.com
>

-- 
Pete Smith
DevOp/System Administrator
Realise Studio
12/13 Poland Street, London W1F 8QB
T. +44 (0)20 7165 9644

realisestudio.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20130415/04b9d0c2/attachment.html>