Thanks for clearing that up. I had to wait about 30 minutes for all rebalancing activity to cease, then I was able to add a new brick. What does it use to migrate the files? The copy rate was pretty slow considering both bricks were on the same server, I only saw about 200MB/Sec. Each brick is a 16 disk ZFS raidz2, copying with dd I can get well over 500MB/Sec. On Tue, 2013-12-10 at 11:30 +0530, Kaushal M wrote: > On Tue, Dec 10, 2013 at 11:09 AM, Franco Broi <franco.broi@xxxxxxxxxx> wrote: > > On Tue, 2013-12-10 at 10:56 +0530, shishir gowda wrote: > >> Hi Franco, > >> > >> > >> If a file is under migration, and a rebalance stop is encountered, > >> then rebalance process exits only after the completion of the > >> migration. > >> > >> That might be one of the reasons why you saw rebalance in progress > >> message while trying to add the brick > > > > The status said it was stopped. I didn't do a top on the machine but are > > you saying that it was still rebalancing despite saying it had stopped? > > > > The 'stopped' status is a little bit misleading. The rebalance process > could have been migrating a large file when the stop command was > issued, so the process would continue migrating that file and quit > once it finished. In this time period, though the status says > 'stopped' the rebalance process is actually running, which prevents > other operations from happening. Ideally, we would have a 'stopping' > status which would convey the correct meaning. But for now we can only > verify that a rebalance process has actually stopped by monitoring the > actual rebalance process. The rebalance process is a 'glusterfs' > process with some arguments containing rebalance. > > >> > >> Could you please share the average file size in your setup? > >> > > > > Bit hard to say, I just copied some data from our main processing > > system. The sizes range from very small to 10's of gigabytes. > > > >> > >> You could always check the rebalance status command to ensure > >> rebalance has indeed completed/stopped before proceeding with the > >> add-brick. Using add-brick force while rebalance is on-going should > >> not be used in normal scenarios. I do see that in your case, they show > >> stopped/completed. Glusterd logs would help in triaging the issue. > > > > See attached. > > > >> > >> > >> Rebalance re-writes layouts, and migrates data. While this is > >> happening, if a add-brick is done, then the cluster might go into a > >> imbalanced stated. Hence, the check if rebalance is in progress while > >> doing add-brick > > > > I can see that but as far as I could tell, the rebalance had stopped > > according to the status. > > > > Just to be clear, what command restarts the rebalancing? > > > >> > >> > >> With regards, > >> Shishir > >> > >> > >> > >> On 10 December 2013 10:39, Franco Broi <franco.broi@xxxxxxxxxx> wrote: > >> > >> Before attempting a rebalance on my existing distributed > >> Gluster volume > >> I thought I'd do some testing with my new storage. I created a > >> volume > >> consisting of 4 bricks on the same server and wrote some data > >> to it. I > >> then added a new brick from a another server. I ran the > >> fix-layout and > >> wrote some new files and could see them on the new brick. All > >> good so > >> far, so I started the data rebalance. After it had been > >> running for a > >> while I wanted to add another brick, which I obviously > >> couldn't do while > >> it was running so I stopped it. Even with it stopped It > >> wouldn't let me > >> add a brick so I tried restarting it, but it wouldn't let me > >> do that > >> either. I presume you just reissue the start command as > >> there's no > >> restart? > >> > >> [root@nas3 ~]# gluster vol rebalance test-volume status > >> Node Rebalanced-files > >> size scanned failures skipped > >> status run time in secs > >> --------- ----------- ----------- ----------- > >> ----------- ----------- ------------ -------------- > >> localhost 7 611.7GB 1358 > >> 0 10 stopped 4929.00 > >> localhost 7 611.7GB 1358 > >> 0 10 stopped 4929.00 > >> nas4-10g 0 0Bytes 1506 > >> 0 0 completed 8.00 > >> volume rebalance: test-volume: success: > >> [root@nas3 ~]# gluster vol add-brick test-volume > >> nas4-10g:/data14/gvol > >> volume add-brick: failed: Volume name test-volume rebalance is > >> in progress. Please retry after completion > >> [root@nas3 ~]# gluster vol rebalance test-volume start > >> volume rebalance: test-volume: failed: Rebalance on > >> test-volume is already started > >> > >> In the end I used the force option to make it start but was > >> that the > >> right thing to do? > >> > >> glusterfs 3.4.1 built on Oct 28 2013 11:01:59 > >> Volume Name: test-volume > >> Type: Distribute > >> Volume ID: 56ee0173-aed1-4be6-a809-ee0544f9e066 > >> Status: Started > >> Number of Bricks: 5 > >> Transport-type: tcp > >> Bricks: > >> Brick1: nas3-10g:/data9/gvol > >> Brick2: nas3-10g:/data10/gvol > >> Brick3: nas3-10g:/data11/gvol > >> Brick4: nas3-10g:/data12/gvol > >> Brick5: nas4-10g:/data13/gvol > >> > >> > >> _______________________________________________ > >> Gluster-users mailing list > >> Gluster-users@xxxxxxxxxxx > >> http://supercolony.gluster.org/mailman/listinfo/gluster-users > >> > >> > > > > > > _______________________________________________ > > Gluster-users mailing list > > Gluster-users@xxxxxxxxxxx > > http://supercolony.gluster.org/mailman/listinfo/gluster-users _______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://supercolony.gluster.org/mailman/listinfo/gluster-users