Re: Pausing rebalance

Franco Broi <franco.broi@xxxxxxxxxx> · Tue, 10 Dec 2013 13:39:38 +0800

On Tue, 2013-12-10 at 10:56 +0530, shishir gowda wrote: 
> Hi Franco,
> 
> 
> If a file is under migration, and a rebalance stop is encountered,
> then rebalance process exits only after the completion of the
> migration.
> 
> That might be one of the reasons why you saw rebalance in progress
> message while trying to add the brick

The status said it was stopped. I didn't do a top on the machine but are
you saying that it was still rebalancing despite saying it had stopped?

> 
> Could you please share the average file size in your setup?
> 

Bit hard to say, I just copied some data from our main processing
system. The sizes range from very small to 10's of gigabytes.

> 
> You could always check the rebalance status command to ensure
> rebalance has indeed completed/stopped before proceeding with the
> add-brick. Using add-brick force while rebalance is on-going should
> not be used in normal scenarios. I do see that in your case, they show
> stopped/completed. Glusterd logs would help in triaging the issue.

See attached.

> 
> 
> Rebalance re-writes layouts, and migrates data. While this is
> happening, if a add-brick is done, then the cluster might go into a
> imbalanced stated. Hence, the check if rebalance is in progress while
> doing add-brick

I can see that but as far as I could tell, the rebalance had stopped
according to the status.

Just to be clear, what command restarts the rebalancing?

> 
> 
> With regards,
> Shishir
> 
> 
> 
> On 10 December 2013 10:39, Franco Broi <franco.broi@xxxxxxxxxx> wrote:
>         
>         Before attempting a rebalance on my existing distributed
>         Gluster volume
>         I thought I'd do some testing with my new storage. I created a
>         volume
>         consisting of 4 bricks on the same server and wrote some data
>         to it. I
>         then added a new brick from a another server. I ran the
>         fix-layout and
>         wrote some new files and could see them on the new brick. All
>         good so
>         far, so I started the data rebalance. After it had been
>         running for a
>         while I wanted to add another brick, which I obviously
>         couldn't do while
>         it was running so I stopped it. Even with it stopped It
>         wouldn't let me
>         add a brick so I tried restarting it, but it wouldn't let me
>         do that
>         either. I presume you just reissue the start command as
>         there's no
>         restart?
>         
>         [root@nas3 ~]# gluster vol rebalance test-volume status
>                                             Node Rebalanced-files
>              size       scanned      failures       skipped
>         status run time in secs
>         ---------      -----------   -----------   -----------
>         -----------   -----------   ------------   --------------
>         localhost                7       611.7GB          1358
>         0            10        stopped          4929.00
>         localhost                7       611.7GB          1358
>         0            10        stopped          4929.00
>          nas4-10g                0        0Bytes          1506
>         0             0      completed             8.00
>         volume rebalance: test-volume: success:
>         [root@nas3 ~]# gluster vol add-brick test-volume
>         nas4-10g:/data14/gvol
>         volume add-brick: failed: Volume name test-volume rebalance is
>         in progress. Please retry after completion
>         [root@nas3 ~]# gluster vol rebalance test-volume start
>         volume rebalance: test-volume: failed: Rebalance on
>         test-volume is already started
>         
>         In the end I used the force option to make it start but was
>         that the
>         right thing to do?
>         
>         glusterfs 3.4.1 built on Oct 28 2013 11:01:59
>         Volume Name: test-volume
>         Type: Distribute
>         Volume ID: 56ee0173-aed1-4be6-a809-ee0544f9e066
>         Status: Started
>         Number of Bricks: 5
>         Transport-type: tcp
>         Bricks:
>         Brick1: nas3-10g:/data9/gvol
>         Brick2: nas3-10g:/data10/gvol
>         Brick3: nas3-10g:/data11/gvol
>         Brick4: nas3-10g:/data12/gvol
>         Brick5: nas4-10g:/data13/gvol
>         
>         
>         _______________________________________________
>         Gluster-users mailing list
>         Gluster-users@xxxxxxxxxxx
>         http://supercolony.gluster.org/mailman/listinfo/gluster-users 
> 
> 

Attachment:
etc-glusterfs-glusterd.vol.log.gz

Description: application/gzip
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://supercolony.gluster.org/mailman/listinfo/gluster-users