I had a RAID array fail due to a number of Seagate drives going down, so this gave me an opportunity to check the recovery of gluster volumes. I found that the replicated volumes came up just fine, but the non-replicated ones have not. I'm wondering if there's a better solution than simply blowing them away and creating fresh ones (especially to keep the half data set in the distributed volume). The platform is ubuntu 12.04, glusterfs 3.3.0. There are two nodes, dev-storage1/2, and four volumes: * A distributed volume across the two nodes Volume Name: fast Type: Distribute Volume ID: 864fd12d-d879-4310-abaa-a2cb99b7f695 Status: Started Number of Bricks: 2 Transport-type: tcp Bricks: Brick1: dev-storage1:/disk/storage1/fast Brick2: dev-storage2:/disk/storage2/fast * A replicated volume across the two nodes Volume Name: safe Type: Replicate Volume ID: 47a8f326-0e48-4a71-9cfe-f9ef8d555db7 Status: Started Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: dev-storage1:/disk/storage1/safe Brick2: dev-storage2:/disk/storage2/safe * Two single-brick volumes, one on each node. Volume Name: single1 Type: Distribute Volume ID: 74d62eb4-176e-4671-8471-779d909e19f0 Status: Started Number of Bricks: 1 Transport-type: tcp Bricks: Brick1: dev-storage1:/disk/storage1/single1 Volume Name: single2 Type: Distribute Volume ID: edab496f-c204-4122-ad10-c5f2e2ac92bd Status: Started Number of Bricks: 1 Transport-type: tcp Bricks: Brick1: dev-storage2:/disk/storage2/single2 These four volumes are FUSE-mounted on /gluster/safe /gluster/fast /gluster/single1 /glsuter/single2 on both servers. The bricks are sharing their underlying filesystems, i.e. dev-storage1:/disk/storage1 and dev-storage2:/disk/storage2. Now, the filesystem dev-storage1:/disk/storage1 failed. I created a new filesystem mounted on dev-storage1:/disk/storage1, did mkdir /disk/storage1/{single1,safe,fast} and restarted glusterd. After a couple of minutes, the contents of the replicated volume ("safe") was synchronised between the two nodes. That is, ls -lR /gluster/safe ls -lR /disk/storage1/safe # on dev-storage1 ls -lR /disk/storage2/safe # on dev-storage2 all showed the same. This is excellent. However the other two filesystems which depend on dev-storage1 are broken. As this is a dev system I could just blow them away, but I would like to use this as an exercise for fixing broken filesystems which I may have to do in production later. Here are the problems: (1) The "single1" volume is empty, which I expected since it's a brand new empty directory, but I cannot create files in it. root at dev-storage1:~# touch /gluster/single1/test touch: cannot touch `/gluster/single1/test': Read-only file system I guess gluster doesn't like the lack of metadata on this directory. Is there a quick recovery procedure here, or do I need to destroy the volume and recreate it? (2) The "fast" (distributed) volume appears empty to the clients: root at dev-storage1:~# ls /gluster/fast root at dev-storage1:~# However there is still half the content available in the brick which didn't fail: root at dev-storage2:~# ls /disk/storage2/fast images iso root at dev-storage2:~# Although this is a test system, ideally I would like to reactivate this volume and make the half data set available. I guess I could destroy the volume, move the data to a safe place, create a new volume and copy in the data. Is there a more direct way? Thanks, Brian.