Re: Problem in replicating an existing gluster volume from single brick setup to two brick setup

Atin Mukherjee <amukherj@xxxxxxxxxx> · Thu, 25 Aug 2016 21:25:18 +0530

On Thu, Aug 25, 2016 at 7:00 PM, Jabran Asghar <jabran.asghar@xxxxxxxxxxx> wrote:
Greetings,

I have a problem in replicating an existing gluster volume from single brick setup to two brick setup. Background of the problem is following:

OS: Ubuntu 14.04
Gluster version (from gluster repos): glusterfs 3.7.14 built on Aug  1 2016 16:57:28

1. I had a replication setup consisting of two Gluster bricks (srv100, srv102), and three volumes (gv0, gv100). 
2. I had to completely rebuild raid/disks of one of the bricks (srv100) due to hardware failure. I did it by doing following on the faulty node:
2.1 Removed the failed brick from replication setup (reduced replica count to 1 from 2, and detached the node). I executed following commands on the *good* brick.
              sudo gluster volume remove-brick gv100 replica 1 srv100:/pool01/gfs/brick1/gv100 force
             sudo gluster volume remove-brick gv0 replica 1 srv100:/pool01/gfs/brick1/gv0 force
              sudo gluster vol info #make sure the faulty node bricks are not listed, and brick count is 1 for each volume
              sudo gluster peer detach srv100 force
              sudo gluster peer status # --> OK, only one node/brick

2.2 Stopped glusterd, killed all gluster processes
2.3 Replaced HDs, and recreated raid. This means all GlusterFS data relevant directories were lost on the faulty-brick (srv100), while GlusterFS service installation and config files were untouched (including host name and IP address).
2.4 After rebuilding, I created volume directories on the rebuilt-node
2.5 Then I started gluster service, and added the node back to gluster cluster. Peer status is ok (in cluster)

This step should have synced the existing volumes as well. Could you share the glusterd log file of srv100 with us to check what went wrong that time? Does restarting glusterd on srv100 bring back the existing volumes under /var/lib/glusterd ? 

2.6 Then I attempted to replicate one of the existing volume (gv0), and *there* came the problem. The replication could not be setup properly, and gave following error
           sudo gluster volume add-brick gv0 replica 2 srv100:/pool01/gfs/brick1/gv0
                 volume add-brick: failed: Staging failed on srv100. Please check log file for details.

    The relevant gluster log file says

[2016-08-25 12:32:29.499708] I [MSGID: 106499] [glusterd-handler.c:4267:__glusterd_handle_status_volume] 0-management: Received status volume req for volume gv-temp
[2016-08-25 12:32:29.501881] E [MSGID: 106301] [glusterd-syncop.c:1274:gd_stage_op_phase] 0-management: Staging of operation 'Volume Status' failed on localhost : Volume gv-temp is not started
[2016-08-25 12:32:29.505033] I [MSGID: 106499] [glusterd-handler.c:4267:__glusterd_handle_status_volume] 0-management: Received status volume req for volume gv0
[2016-08-25 12:32:29.508585] E [MSGID: 106153] [glusterd-syncop.c:113:gd_collate_errors] 0-glusterd: Staging failed on srv100. Error: Volume gv0 does not exist
[2016-08-25 12:32:29.511062] I [MSGID: 106499] [glusterd-handler.c:4267:__glusterd_handle_status_volume] 0-management: Received status volume req for volume gv100
[2016-08-25 12:32:29.514556] E [MSGID: 106153] [glusterd-syncop.c:113:gd_collate_errors] 0-glusterd: Staging failed on srv100. Error: Volume gv100 does not exist
[2016-08-25 12:33:15.865773] I [MSGID: 106499] [glusterd-handler.c:4267:__glusterd_handle_status_volume] 0-management: Received status volume req for volume gv0
[2016-08-25 12:33:15.869441] E [MSGID: 106153] [glusterd-syncop.c:113:gd_collate_errors] 0-glusterd: Staging failed on srv100. Error: Volume gv0 does not exist
[2016-08-25 12:33:15.872630] I [MSGID: 106499] [glusterd-handler.c:4267:__glusterd_handle_status_volume] 0-management: Received status volume req for volume gv100
[2016-08-25 12:33:15.876199] E [MSGID: 106153] [glusterd-syncop.c:113:gd_collate_errors] 0-glusterd: Staging failed on srv100. Error: Volume gv100 does not exist
[2016-08-25 12:34:14.716735] I [MSGID: 106482] [glusterd-brick-ops.c:442:__glusterd_handle_add_brick] 0-management: Received add brick req
[2016-08-25 12:34:14.716787] I [MSGID: 106062] [glusterd-brick-ops.c:494:__glusterd_handle_add_brick] 0-management: replica-count is 2
[2016-08-25 12:34:14.716809] I [MSGID: 106447] [glusterd-brick-ops.c:240:gd_addbr_validate_replica_count] 0-management: Changing the type of volume gv0 from 'distribute' to 'replica'
[2016-08-25 12:34:14.720133] E [MSGID: 106153] [glusterd-syncop.c:113:gd_collate_errors] 0-glusterd: Staging failed on srv100. Please check log file for details.

3. I tried to create a new replicated volume (gv-temp) over the nodes à it is created and replicated. It is only that the existing volume I cannot replicate again!
4. I also observed that /var/lib/glusterd/vols directory on the rebuilt node contains directory for the newly created volume (gv-temp), and no existing volumes (gv100, gv0)

*Questions:*  
a. How to re-replicate the exiting volume, for which I set the replica count to 1 (see point 2.1)?
b. Is there a “glusterfs” way to create missing volume directories (under /var/lib/glusterd/vols) on the re-built node (see point 4)?
c. Any other pointers, hints?

Thanks.

Kind regards,
JAsghar

_______________________________________________

Gluster-users mailing list

Gluster-users@xxxxxxxxxxx

http://www.gluster.org/mailman/listinfo/gluster-users

-- 

--Atin

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users