Kosher admin practices: What do you do with failed heals? (and out-of-sync replicated bricks)

peek at nimbios.org (Michael Peek) · Wed, 31 Jul 2013 13:19:37 -0400

Hi gurus,

I'm back with more shenanigans.

I've been testing a setup with four machines, two drives in each.  While
running an rsync to back up a bunch of files to the volume I simulated a
drive failure by forcing one of the drives to remount read-only.  I then
took Joe Julian's advice and brought the brick back online by:

1) Killing the glusterfsd that was running on this brick
2) Unmounting, fsck'ing, remounting the drive (with a real drive
failure, of course, I would be replacing the drive)
3) Typing "gluster volume start $vol force"

It seemed to work wonderfully.

Next I decided to wipe the data on the volume with an "rm -fr".  What
I'm left with are a couple of directories that cannot be removed.  I get
a "Directory not empty" error.

When I look at the bricks, the brick that I took offline has a file in
each directory, whereas the replicated brick's directories are empty. 
Specifically, the files left behind are the transient files that rsync
creates when it copies.  They have a nonsensical file extension that
looks like '.iPDK8i'.  Once rsync finishes copying a file it renames the
file, removing the nonsensical extension.  But since the brick in
question was offline when rsync renamed the files, it's version of the
files with the nonsense extension still exist.  But the use of rsync
aside, were this a production volume with active users the same scenario
could still have happened even without rsync.  (In fact, I've created
this type of scenario before without rsync by taking a brick offline
while an "rm -fr" was running.)

Gluster reports no split-brain files, but does report some (35) failed
heals.

Next I ran "gluster volume heal $vol force".

Since there are only two files on the whole volume I didn't expect this
to take long.  I've left it alone for an hour.  However, there's no way
that I know of to check and see if the healing process has completed.

The command "gluster volume heal $vol info" still lists the two files in
question as failed heals.  Everything else (the other 33 files reported
by gluster earlier) have been taken care of.

So what's the correct way to fix this problem?  I could just delete the
files from the brick directly, but won't that still leave behind
something in the .glusterfs/ metadata directory?

Does gluster have a mechanism to mark a brick as degraded and force a
re-sync from it's replicant?  I didn't see anything in the manual about
such a mechanism, but maybe I missed it.

What would happen if I simply used rsync to resync the replicant brick's
data, including the .glusterfs/ metadata directory, back onto the
out-of-sync brick?  My guess is such an approach would be disastrous on
a running system unless I at least killed the gluster processes managing
the two bricks.

Michael Peek