Ok, a smaller test case for release-3.3 branch. I can't seem to remove a brick without somehow breaking the volume: [14:53:46] root at fs-5.mseeger:~# mkdir /test [14:55:23] root at fs-5.mseeger:~# cd /test/ [14:55:26] root at fs-5.mseeger:/test# mkdir b1 [14:55:28] root at fs-5.mseeger:/test# mkdir b2 [14:55:29] root at fs-5.mseeger:/test# mkdir b3 [14:55:31] root at fs-5.mseeger:/test# gluster volume create marctest replica 3 fs-5.mseeger:/test/b1 fs-5.mseeger:/test/b2 fs-5.mseeger:/test/b3 Multiple bricks of a replicate volume are present on the same server. This setup is not optimal. Do you still want to continue creating the volume? (y/n) y Creation of volume marctest has been successful. Please start the volume to access data. [14:56:07] root at fs-5.mseeger:/test# gluster volume start marctest Starting volume marctest has been successful [14:57:40] root at fs-5.mseeger:/test# gluster volume info marctest Volume Name: marctest Type: Replicate Volume ID: a25ee38b-156c-4ea0-87d6-0522af615c72 Status: Started Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: fs-5.mseeger:/test/b1 Brick2: fs-5.mseeger:/test/b2 Brick3: fs-5.mseeger:/test/b3 [14:57:44] root at fs-5.mseeger:/test# gluster volume remove-brick marctest replica 2 fs-5.mseeger:/test/b3 start Remove Brick start unsuccessful [14:57:52] root at fs-5.mseeger:/test# gluster volume info marctest Volume Name: marctest Type: Distributed-Replicate Volume ID: a25ee38b-156c-4ea0-87d6-0522af615c72 Status: Started Number of Bricks: 1 x 2 = 3 Transport-type: tcp Bricks: Brick1: fs-5.mseeger:/test/b1 Brick2: fs-5.mseeger:/test/b2 Brick3: fs-5.mseeger:/test/b3 [14:58:03] root at fs-5.mseeger:/test# gluster volume remove-brick marctest replica 2 fs-5.mseeger:/test/b3 start number of bricks provided (1) is not valid. need at least 2 (or 2xN) [14:58:56] root at fs-5.mseeger:/test# gluster volume stop marctest Stopping volume will make its data inaccessible. Do you want to continue? (y/n) y Stopping volume marctest has been successful [15:01:22] root at fs-5.mseeger:/test# gluster volume start marctest Starting volume marctest has been unsuccessful These are the log file entries for the initial removal: [2013-06-11 14:57:44.498903] I [glusterd-handler.c:866:glusterd_handle_cli_get_volume] 0-glusterd: Received get vol req [2013-06-11 14:57:52.758892] I [glusterd-brick-ops.c:601:glusterd_handle_remove_brick] 0-glusterd: Received rem brick req [2013-06-11 14:57:52.758892] I [glusterd-brick-ops.c:642:glusterd_handle_remove_brick] 0-management: request to change replica-count to 2 [2013-06-11 14:57:52.758892] I [glusterd-utils.c:857:glusterd_volume_brickinfo_get_by_brick] 0-: brick: fs-5.mseeger:/test/b3 [2013-06-11 14:57:52.758892] I [glusterd-utils.c:814:glusterd_volume_brickinfo_get] 0-management: Found brick [2013-06-11 14:57:52.758892] I [glusterd-utils.c:285:glusterd_lock] 0-glusterd: Cluster lock held by 7c798980-5413-484c-ac33-aeb873acec7d [2013-06-11 14:57:52.758892] I [glusterd-handler.c:463:glusterd_op_txn_begin] 0-management: Acquired local lock [2013-06-11 14:57:52.758892] I [glusterd-rpc-ops.c:548:glusterd3_1_cluster_lock_cbk] 0-glusterd: Received ACC from uuid: f2bb435f-5db3-4ea9-b640-fc5aab3fdf76 [2013-06-11 14:57:52.758892] I [glusterd-op-sm.c:2039:glusterd_op_ac_send_stage_op] 0-glusterd: Sent op req to 1 peers [2013-06-11 14:57:52.758892] I [glusterd-rpc-ops.c:881:glusterd3_1_stage_op_cbk] 0-glusterd: Received ACC from uuid: f2bb435f-5db3-4ea9-b640-fc5aab3fdf76 [2013-06-11 14:57:52.758892] I [glusterd-op-sm.c:3487:glusterd_bricks_select_remove_brick] 0-management: force flag is not set [2013-06-11 14:57:52.758892] I [glusterd-utils.c:857:glusterd_volume_brickinfo_get_by_brick] 0-: brick: fs-5.mseeger:/test/b3 [2013-06-11 14:57:52.758892] I [glusterd-utils.c:814:glusterd_volume_brickinfo_get] 0-management: Found brick [2013-06-11 14:57:52.768892] I [glusterd-brick-ops.c:1590:glusterd_op_remove_brick] 0-management: changing replica count 3 to 2 on volume marctest [2013-06-11 14:57:52.768892] E [glusterd-volgen.c:2158:volgen_graph_build_clients] 0-: volume inconsistency: total number of bricks (3) is not divisible with number of bricks per cluster (2) in a multi-cluster setup [2013-06-11 14:57:52.768892] E [glusterd-volgen.c:3286:glusterd_create_volfiles_and_notify_services] 0-management: Could not generate trusted client volfiles [2013-06-11 14:57:52.768892] W [glusterd-brick-ops.c:1609:glusterd_op_remove_brick] 0-management: failed to create volfiles [2013-06-11 14:57:52.768892] E [glusterd-op-sm.c:2350:glusterd_op_ac_send_commit_op] 0-management: Commit failed [2013-06-11 14:57:52.768892] I [glusterd-op-sm.c:2254:glusterd_op_modify_op_ctx] 0-management: op_ctx modification not required [2013-06-11 14:57:52.768892] I [glusterd-rpc-ops.c:607:glusterd3_1_cluster_unlock_cbk] 0-glusterd: Received ACC from uuid: f2bb435f-5db3-4ea9-b640-fc5aab3fdf76 [2013-06-11 14:57:52.768892] I [glusterd-op-sm.c:2653:glusterd_op_txn_complete] 0-glusterd: Cleared local lock [2013-06-11 14:58:03.018878] I [glusterd-handler.c:866:glusterd_handle_cli_get_volume] 0-glusterd: Received get vol req [2013-06-11 14:58:56.278813] I [glusterd-brick-ops.c:601:glusterd_handle_remove_brick] 0-glusterd: Received rem brick req [2013-06-11 14:58:56.278813] I [glusterd-brick-ops.c:642:glusterd_handle_remove_brick] 0-management: request to change replica-count to 2 [2013-06-11 14:58:56.278813] W [glusterd-brick-ops.c:319:gd_rmbr_validate_replica_count] 0-management: number of bricks provided (1) is not valid. need at least 2 (or 2xN) [2013-06-11 14:58:56.278813] E [glusterd-brick-ops.c:844:glusterd_handle_remove_brick] 0-: number of bricks provided (1) is not valid. need at least 2 (or 2xN) [2013-06-11 15:01:05.688935] I [glusterd-volume-ops.c:354:glusterd_handle_cli_stop_volume] 0-glusterd: Received stop vol reqfor volume marctest On Jun 11, 2013, at 3:01 PM, Bobby Jacob <bobby.jacob at alshaya.com> wrote: > Hi All, > > I'm using the following glusterFS version: > glusterfs 3.3.1 built on Oct 11 2012 > I was successfully able to remove bricks from a 4-replica volume by reducing > the replica count to 3. My "gluster volume status" displayed the status of > the volume to be a 3-Mode Replicate volume. Further I removed another brick > by reducing the replica to 2. > > Later, added another node using add-brick and increasing the replica count > to 3. ALL WORKED FINE FOR ME. !! > > Here are the commands I used: > 1) gluster volume remove-brick Cloud-data replica 3 GSNODE01:/mnt/brick1 > (Changed Replica count from 4 to 3) > 2) gluster volume remove-brick Cloud-data replica 2 GSNODE01:/mnt/brick2 > (Changed Replica count from 3 to 2) > 3) gluster volume add-brick Cloud-data replica 3 GSNODE01:/brick4 > (Changed Replica count from 2 to 3) > > Thanks & Regards, > > Bobby Jacob > Senior Technical Systems Engineer | eGroup > > -----Original Message----- > From: gluster-users-bounces at gluster.org > [mailto:gluster-users-bounces at gluster.org] On Behalf Of Marc Seeger > Sent: Tuesday, June 11, 2013 3:42 PM > To: gluster-users at gluster.org > Subject: Removing bricks from a replicated setup completely > brakes volume on Gluster 3.3 > > Initial setup: A replicated volume with 3 bricks > Goal: Remove one of the bricks from it. > Version: # glusterfs 3.3git built on Jun 7 2013 14:38:02 (branch > release-3.3) > > Initial setup: A replicated volume with 3 bricks > Goal: Remove one of the bricks from it. > Outcome: A completely broken volume > > > ------------- Volume info ------------- > > root at fs-14.example:~# gluster volume info > > Volume Name: test-fs-cluster-1 > Type: Replicate > Volume ID: 752e7ffd-04bb-4234-8d16-d1f49ef510b7 > Status: Started > Number of Bricks: 1 x 3 = 3 > Transport-type: tcp > Bricks: > Brick1: fs-14.example.com:/mnt/brick21 > Brick2: fs-15.example.com:/mnt/brick20 > Brick3: fs-14.example.com:/mnt/brick33 > > > ------------- Trying to remove a brick ------------- > > fields-config-gluster.rb[5035]: Using commandline: gluster volume > remove-brick test-fs-cluster-1 replica 2 fs-14.example.com:/mnt/brick33 > start > fields-config-gluster.rb[5035]: Command returned exit code 255: gluster > volume remove-brick test-fs-cluster-1 replica 2 > fs-14.example.com:/mnt/brick33 start stdout was: > > stderr was: > Remove Brick start unsuccessful > > > > > ------------- Volume turned Distributed-Replicate ------------- [12:23:37] > root at fs-14.example:~# gluster volume info > > Volume Name: test-fs-cluster-1 > Type: Distributed-Replicate > Volume ID: 752e7ffd-04bb-4234-8d16-d1f49ef510b7 > Status: Started > Number of Bricks: 1 x 2 = 3 > Transport-type: tcp > Bricks: > Brick1: fs-14.example.com:/mnt/brick21 > Brick2: fs-15.example.com:/mnt/brick20 > Brick3: fs-14.example.com:/mnt/brick33 > > > ------------- Trying to remove brick again ------------- > > [12:26:20] root at fs-14.example:~# gluster volume remove-brick > test-fs-cluster-1 replica 2 fs-14.example.com:/mnt/brick33 start number of > bricks provided (1) is not valid. need at least 2 (or 2xN) > > ------------- Trying to stop volume ------------- > > [12:28:34] root at fs-14.example:~# gluster volume stop test-fs-cluster-1 > Stopping volume will make its data inaccessible. Do you want to continue? > (y/n) y Stopping volume test-fs-cluster-1 has been successful > > > ------------- Trying to start volume again ------------- [12:29:03] > root at fs-14.example:~# gluster volume start test-fs-cluster-1 Starting volume > test-fs-cluster-1 has been unsuccessful > > ------------- Trying to stop volume again ------------- > > [12:29:49] root at fs-14.example:~# gluster volume stop test-fs-cluster-1 > Stopping volume will make its data inaccessible. Do you want to continue? > (y/n) y Volume test-fs-cluster-1 is not in the started state > > ------------- Trying to delete volume ------------- > > [12:29:55] root at fs-14.example:~# gluster volume delete test-fs-cluster-1 > Deleting volume will erase all information about the volume. Do you want to > continue? (y/n) y Volume test-fs-cluster-1 has been started.Volume needs to > be stopped before deletion. > > ------------- Checking volume info ------------- # gluster volume info > > Volume Name: test-fs-cluster-1 > Type: Distributed-Replicate > Volume ID: 752e7ffd-04bb-4234-8d16-d1f49ef510b7 > Status: Started > Number of Bricks: 1 x 2 = 3 > Transport-type: tcp > Bricks: > Brick1: fs-14.example.com:/mnt/brick21 > Brick2: fs-15.example.com:/mnt/brick20 > Brick3: fs-14.example.com:/mnt/brick33 > > ------------- Trying to stop volume again ------------- [12:30:50] > root at fs-14.example:~# gluster volume stop test-fs-cluster-1 Stopping volume > will make its data inaccessible. Do you want to continue? (y/n) y Volume > test-fs-cluster-1 is not in the started state > > > > ------------- Restarting glusterfs-server ------------- > > [12:38:05] root at fs-14.example:~# /etc/init.d/glusterfs-server restart > glusterfs-server start/running, process 6426 > > ------------- Volume switched back to "Replicate" ------------- [12:38:33] > root at fs-14.example:~# gluster volume info > > Volume Name: test-fs-cluster-1 > Type: Replicate > Volume ID: 752e7ffd-04bb-4234-8d16-d1f49ef510b7 > Status: Started > Number of Bricks: 1 x 3 = 3 > Transport-type: tcp > Bricks: > Brick1: fs-14.example.com:/mnt/brick21 > Brick2: fs-15.example.com:/mnt/brick20 > Brick3: fs-14.example.com:/mnt/brick33 > > > ------------- Trying to stop volume again ------------- [12:38:39] > root at fs-14.example:~# gluster volume stop test-fs-cluster-1 Stopping volume > will make its data inaccessible. Do you want to continue? (y/n) y Volume > test-fs-cluster-1 is not in the started state > > > > Any idea what's up with that? > > Cheers, > Marc > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://supercolony.gluster.org/mailman/listinfo/gluster-users > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20130611/7d527a34/attachment-0001.html>