Re: Rebalance times in 3.2.5 vs 3.4.2

Matt Edwards <matted@xxxxxxx> · Thu, 27 Feb 2014 21:54:46 -0500

Hi Viktor,
Thanks for the tips.  I'm a bit confused, since the clients mount the share fine, and "gluster peer status" and "gluster volume status all detail" are happy.

What is the expected output of "rebalance status" for just a fix-layout run?  I believe the last time I did that, the status was always 0s (which makes some sense, as files aren't moving) and the log was empty, but the operation seemed to complete successfully.  Does a file rebalance first require a fix-layout operation internally, and is it possible that my volume is still in that phase?  Or I making up an overly optimistic scenario?

Thanks,

Matt

On Thu, Feb 27, 2014 at 8:33 PM, Viktor Villafuerte <viktor.villafuerte@xxxxxxxxxxxxxxx> wrote:

Hi Matt,

if the 'status' says 0 for everything that's not good. Normally when I

do rebalance the numbers should change (up). Also the rebalance log

should show files being moved around.

For the errors - my (limited) experience with Gluster is that the 'W'

are normally harmless and they show up quite a bit. For the actuall

error 'E' you could try to play with 'auth.allow' as suggested here

http://gluster.org/pipermail/gluster-users/2011-November/009094.html

Normally when rebalancing I do count of files on the bricks and the

Gluster mount to make sure they eventually add up. Also I grep and count

'-T' and see how the count goes down and 'rw' count goes up.

v

On Thu 27 Feb 2014 00:57:28, Matt Edwards wrote:

> Hopefully I'm not derailing this thread too far, but I have a related

> rebalance progress/speed issue.

>

> I have a rebalance process started that's been running for 3-4 days.  Is

> there a good way to see if it's running successfully, or might this be a

> sign of some problem?

>

> This is on a 4-node distribute setup with v3.4.2 and 45T of data.

>

> The *-rebalance.log has been silent since some informational messages when

> the rebalance started.  There were a few initial warnings and errors that I

> observed, though:

>

>

> E [client-handshake.c:1397:client_setvolume_cbk] 0-cluster2-client-0:

> SETVOLUME on remote-host failed: Authentication failed

>

> W [client-handshake.c:1365:client_setvolume_cbk] 0-cluster2-client-4:

> failed to set the volume (Permission denied)

>

> W [client-handshake.c:1391:client_setvolume_cbk] 0-cluster2-client-4:

> failed to get 'process-uuid' from reply dict

>

> W [socket.c:514:__socket_rwv] 0-cluster2-client-3: readv failed (No data

> available)

>

>

> "gluster volume status" reports that the rebalance is in progress, the

> process listed in vols/<volname>/rebalance/<hash>.pid is still running on

> the server, but "gluster volume rebalance <volname> status" reports 0 for

> everything (files scanned or rebalanced, failures, run time).

>

> Thanks,

>

> Matt

>

>

> On Thu, Feb 27, 2014 at 12:39 AM, Shylesh Kumar <shmohan@xxxxxxxxxx> wrote:

>

> > Hi Viktor,

> >

> > Lots of optimizations and improvements went in for 3.4 so it should be

> > faster than 3.2.

> > Just to make sure what's happening could you please check rebalance logs

> > which will be in

> > /var/log/glusterfs/<volname>-rebalance.log and check is there any

> > progress ?

> >

> > Thanks,

> > Shylesh

> >

> >

> > Viktor Villafuerte wrote:

> >

> >> Anybody can confirm/dispute that this is normal/abnormal?

> >>

> >> v

> >>

> >>

> >> On Tue 25 Feb 2014 15:21:40, Viktor Villafuerte wrote:

> >>

> >>> Hi all,

> >>>

> >>> I have distributed replicated set with 2 servers (replicas) and am

> >>> trying to add another set of replicas: 1 x (1x1) => 2 x (1x1)

> >>>

> >>> I have about 23G of data which I copy onto the first replica, check

> >>> everything and then add the other set of replicas and eventually

> >>> rebalance fix-layout, migrate-data.

> >>>

> >>> Now on

> >>>

> >>> Gluster v3.2.5 this took about 30 mins (to rebalance + migrate-data)

> >>>

> >>> on

> >>>

> >>> Gluster v3.4.2 this has been running for almost 4 hours and it's still

> >>> not finished

> >>>

> >>>

> >>> As I may have to do this in production, where the amount of data is

> >>> significantly larger than 23G, I'm looking at about three weeks of wait

> >>> to rebalance :)

> >>>

> >>> Now my question is if this is as it's meant to be? I can see that v3.4.2

> >>> gives me more info about the rebalance process etc, but that surely

> >>> cannot justify the enormous time difference.

> >>>

> >>> Is this normal/expected behaviour? If so I will have to stick with the

> >>> v3.2.5 as it seems way quicker.

> >>>

> >>> Please, let me know if there is any 'well known' option/way/secret to

> >>> speed the rebalance up on v3.4.2.

> >>>

> >>>

> >>> thanks

> >>>

> >>>

> >>>

> >>> --

> >>> Regards

> >>>

> >>> Viktor Villafuerte

> >>> Optus Internet Engineering

> >>> t: 02 808-25265

> >>> _______________________________________________

> >>> Gluster-users mailing list

> >>> Gluster-users@xxxxxxxxxxx

> >>> http://supercolony.gluster.org/mailman/listinfo/gluster-users

> >>>

> >>

> > _______________________________________________

> > Gluster-users mailing list

> > Gluster-users@xxxxxxxxxxx

> > http://supercolony.gluster.org/mailman/listinfo/gluster-users

> >

--

Regards

Viktor Villafuerte

Optus Internet Engineering

t: 02 808-25265

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://supercolony.gluster.org/mailman/listinfo/gluster-users