Recovering a broken distributed volume

B.Candler at pobox.com (Brian Candler) · Wed, 11 Jul 2012 11:27:58 +0100

I had a RAID array fail due to a number of Seagate drives going down, so
this gave me an opportunity to check the recovery of gluster volumes.

I found that the replicated volumes came up just fine, but the
non-replicated ones have not.  I'm wondering if there's a better solution
than simply blowing them away and creating fresh ones (especially to keep
the half data set in the distributed volume).

The platform is ubuntu 12.04, glusterfs 3.3.0.

There are two nodes, dev-storage1/2, and four volumes:

* A distributed volume across the two nodes

Volume Name: fast
Type: Distribute
Volume ID: 864fd12d-d879-4310-abaa-a2cb99b7f695
Status: Started
Number of Bricks: 2
Transport-type: tcp
Bricks:
Brick1: dev-storage1:/disk/storage1/fast
Brick2: dev-storage2:/disk/storage2/fast

* A replicated volume across the two nodes

Volume Name: safe
Type: Replicate
Volume ID: 47a8f326-0e48-4a71-9cfe-f9ef8d555db7
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: dev-storage1:/disk/storage1/safe
Brick2: dev-storage2:/disk/storage2/safe

* Two single-brick volumes, one on each node.

Volume Name: single1
Type: Distribute
Volume ID: 74d62eb4-176e-4671-8471-779d909e19f0
Status: Started
Number of Bricks: 1
Transport-type: tcp
Bricks:
Brick1: dev-storage1:/disk/storage1/single1

Volume Name: single2
Type: Distribute
Volume ID: edab496f-c204-4122-ad10-c5f2e2ac92bd
Status: Started
Number of Bricks: 1
Transport-type: tcp
Bricks:
Brick1: dev-storage2:/disk/storage2/single2

These four volumes are FUSE-mounted on
  /gluster/safe
  /gluster/fast
  /gluster/single1
  /glsuter/single2
on both servers.

The bricks are sharing their underlying filesystems, i.e.
dev-storage1:/disk/storage1 and dev-storage2:/disk/storage2.

Now, the filesystem dev-storage1:/disk/storage1 failed. I created a
new filesystem mounted on dev-storage1:/disk/storage1, did

  mkdir /disk/storage1/{single1,safe,fast}

and restarted glusterd.

After a couple of minutes, the contents of the replicated volume ("safe") was
synchronised between the two nodes. That is,

ls -lR /gluster/safe
ls -lR /disk/storage1/safe  # on dev-storage1
ls -lR /disk/storage2/safe  # on dev-storage2

all showed the same. This is excellent.

However the other two filesystems which depend on dev-storage1 are broken. 
As this is a dev system I could just blow them away, but I would like to use
this as an exercise for fixing broken filesystems which I may have to do in
production later.

Here are the problems:

(1) The "single1" volume is empty, which I expected since it's a brand new
empty directory, but I cannot create files in it.

root at dev-storage1:~# touch /gluster/single1/test
touch: cannot touch `/gluster/single1/test': Read-only file system

I guess gluster doesn't like the lack of metadata on this directory. Is
there a quick recovery procedure here, or do I need to destroy the volume
and recreate it?

(2) The "fast" (distributed) volume appears empty to the clients:

root at dev-storage1:~# ls /gluster/fast
root at dev-storage1:~# 

However there is still half the content available in the brick which didn't
fail:

root at dev-storage2:~# ls /disk/storage2/fast
images  iso
root at dev-storage2:~# 

Although this is a test system, ideally I would like to reactivate this
volume and make the half data set available.

I guess I could destroy the volume, move the data to a safe place, create a
new volume and copy in the data.  Is there a more direct way?

Thanks,

Brian.