Failed rebalance resulting in major problems

gluster at elyograg.org (Shawn Heisey) · Tue, 05 Nov 2013 22:05:31 -0700

We recently added storage servers to our gluster install, running 3.3.1
on CentOS 6.  It went from 40TB usable (8x2 distribute-replicate) to
80TB usable (16x2).  There was a little bit over 20TB used space on the
volume.

The add-brick went through without incident, but the rebalance failed
after moving 1.5TB of the approximately 10TB that needed to be moved.  A
side issue is that it took four days for that 1.5TB to move.  I'm aware
that gluster has overhead, and that there's only so much speed you can
get out of gigabit, but a 100Mb/s half-duplex link could have copied the
data faster if it had been a straight copy.

After I discovered that the rebalance had failed, I noticed that there
were other problems.  There are a small number of completely lost files
(91 that I know about so far), a huge number of permission issues (over
800,000 files changed to 000), and about 32000 files that are throwing
read errors via the fuse/nfs mount but seem to be available directly on
bricks.  That last category of problem file has the sticky bit set, with
almost all of them having ---------T permissions.  The good files on
bricks typically have the same permissions, but are readable by root.  I
haven't worked out the scripting necessary to automate all the fixing
that needs to happen yet.

We really need to know what happened.  We do plan to upgrade to 3.4.1,
but there were some reasons that we didn't want to upgrade before adding
storage.

* Upgrading will result in service interruption to our clients, which
mount via NFS.  It would likely be just a hiccup, with quick failover,
but it's still a service interruption.
* We have a pacemaker cluster providing the shared IP address for NFS
mounting.  It's running CentOS 6.3.  A "yum upgrade" to upgrade gluster
will also upgrade to CentOS 6.4.  The pacemaker in 6.4 is incompatible
with the pacemaker in 6.3, which will likely result in
longer-than-expected downtime for the shared IP address.
* We didn't want to risk potential problems with running gluster 3.3.1
on the existing servers and 3.4.1 on the new servers.
* We needed the new storage added right away, before we could schedule
maintenance to deal with the upgrade issues.

Something that would be extremely helpful would be obtaining the
services of an expert-level gluster consultant who can look over
everything we've done to see if there is anything we've done wrong and
how we might avoid problems in the future.  I don't know how much the
company can authorize for this, but we obviously want it to be as cheap
as possible.  We are in Salt Lake City, UT, USA.  It would be preferable
to have the consultant be physically present at our location.

I'm working on redacting one bit of identifying info from our rebalance
log, then I can put it up on dropbox for everyone to examine.

Thanks,
Shawn