On Thu, Sep 5, 2013 at 12:41 AM, Vijay Bellur <vbellur at redhat.com> wrote: > On 09/03/2013 01:18 PM, Anup Nair wrote: > >> Glusterfs version 3.2.2 >> >> I have a Gluster volume in which one our of the 4 peers/nodes had >> crashed some time ago, prior to my joining service here. >> >> I see from volume info that the crashed (non-existing) node is still >> listed as one of the peers and the bricks are also listed. I would like >> to detach this node and its bricks and rebalance the volume with >> remaining 3 peers. But I am unable to do so. Here are my setps: >> >> 1. #gluster peer status >> Number of Peers: 3 -- (note: excluding the one I run this command from) >> >> Hostname: dbstore4r294 --- (note: node/peer that is down) >> Uuid: 8bf13458-1222-452c-81d3-**565a563d768a >> State: Peer in Cluster (Disconnected) >> >> Hostname: 172.16.1.90 >> Uuid: 77ebd7e4-7960-4442-a4a4-**00c5b99a61b4 >> State: Peer in Cluster (Connected) >> >> Hostname: dbstore3r294 >> Uuid: 23d7a18c-fe57-47a0-afbc-**1e1a5305c0eb >> State: Peer in Cluster (Connected) >> >> 2. #gluster peer detach dbstore4r294 >> Brick(s) with the peer dbstore4r294 exist in cluster >> >> 3. #gluster volume info >> >> Volume Name: test-volume >> Type: Distributed-Replicate >> Status: Started >> Number of Bricks: 4 x 2 = 8 >> Transport-type: tcp >> Bricks: >> Brick1: dbstore1r293:/datastore1 >> Brick2: dbstore2r293:/datastore1 >> Brick3: dbstore3r294:/datastore1 >> Brick4: dbstore4r294:/datastore1 >> Brick5: dbstore1r293:/datastore2 >> Brick6: dbstore2r293:/datastore2 >> Brick7: dbstore3r294:/datastore2 >> Brick8: dbstore4r294:/datastore2 >> Options Reconfigured: >> network.ping-timeout: 42s >> performance.cache-size: 64MB >> performance.write-behind-**window-size: 3MB >> performance.io-thread-count: 8 >> performance.cache-refresh-**timeout: 2 >> >> Note that the non-existent node/peer is -- dbstore4r294 (bricks are >> :/datastore1 & /datastore2 - i.e. brick4 and brick8) >> >> 4. #gluster volume remove-brick test-volume dbstore4r294:/datastore1 >> Removing brick(s) can result in data loss. Do you want to Continue? >> (y/n) y >> Remove brick incorrect brick count of 1 for replica 2 >> >> 5. #gluster volume remove-brick test-volume dbstore4r294:/datastore1 >> dbstore4r294:/datastore2 >> Removing brick(s) can result in data loss. Do you want to Continue? >> (y/n) y >> Bricks not from same subvol for replica >> >> How do I remove the peer? What are the steps considering that the node >> is non-existent? >> */ >> > > > Do you plan to replace the dead server with a new server? If so, this > could be a possible sequence of steps: > > No. We are not going to replace it. So, I need to resize it to a 3 node cluster. I discovered the issue when one of the node hung and I had to reboot it. I expected Gluster volume to be available for one node failure. The volume was non-responsive. Surprised at that, I checked the details and found it was running with one node missing for many months now, perhaps an year! I have no node to replace it with. So, I am looking for a method by which I can resize it. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20130905/cd5e3e7e/attachment.html>