Re: Rebalance times in 3.2.5 vs 3.4.2

Matt Edwards <matted@xxxxxxx> · Thu, 27 Feb 2014 00:57:28 -0500

Hopefully I'm not derailing this thread too far, but I have a related rebalance progress/speed issue.
I have a rebalance process started that's been running for 3-4 days.  Is there a good way to see if it's running successfully, or might this be a sign of some problem?

This is on a 4-node distribute setup with v3.4.2 and 45T of data.

The *-rebalance.log has been silent since some informational messages when the rebalance started.  There were a few initial warnings and errors that I observed, though:

E [client-handshake.c:1397:client_setvolume_cbk] 0-cluster2-client-0: SETVOLUME on remote-host failed: Authentication failed

W [client-handshake.c:1365:client_setvolume_cbk] 0-cluster2-client-4: failed to set the volume (Permission denied)

W [client-handshake.c:1391:client_setvolume_cbk] 0-cluster2-client-4: failed to get 'process-uuid' from reply dict

W [socket.c:514:__socket_rwv] 0-cluster2-client-3: readv failed (No data available)

"gluster volume status" reports that the rebalance is in progress, the process listed in vols/<volname>/rebalance/<hash>.pid is still running on the server, but "gluster volume rebalance <volname> status" reports 0 for everything (files scanned or rebalanced, failures, run time).

Thanks,
Matt

On Thu, Feb 27, 2014 at 12:39 AM, Shylesh Kumar <shmohan@xxxxxxxxxx> wrote:

Hi Viktor,

Lots of optimizations and improvements went in for 3.4 so it should be faster than 3.2.

Just to make sure what's happening could you please check rebalance logs which will be in

/var/log/glusterfs/<volname>-rebalance.log and check is there any progress ?

Thanks,

Shylesh

Viktor Villafuerte wrote:

Anybody can confirm/dispute that this is normal/abnormal?

v

On Tue 25 Feb 2014 15:21:40, Viktor Villafuerte wrote:

Hi all,

I have distributed replicated set with 2 servers (replicas) and am

trying to add another set of replicas: 1 x (1x1) => 2 x (1x1)

I have about 23G of data which I copy onto the first replica, check

everything and then add the other set of replicas and eventually

rebalance fix-layout, migrate-data.

Now on

Gluster v3.2.5 this took about 30 mins (to rebalance + migrate-data)

on

Gluster v3.4.2 this has been running for almost 4 hours and it's still

not finished

As I may have to do this in production, where the amount of data is

significantly larger than 23G, I'm looking at about three weeks of wait

to rebalance :)

Now my question is if this is as it's meant to be? I can see that v3.4.2

gives me more info about the rebalance process etc, but that surely

cannot justify the enormous time difference.

Is this normal/expected behaviour? If so I will have to stick with the

v3.2.5 as it seems way quicker.

Please, let me know if there is any 'well known' option/way/secret to

speed the rebalance up on v3.4.2.

thanks

-- 

Regards

Viktor Villafuerte

Optus Internet Engineering

t: 02 808-25265

_______________________________________________

Gluster-users mailing list

Gluster-users@xxxxxxxxxxx

http://supercolony.gluster.org/mailman/listinfo/gluster-users

_______________________________________________

Gluster-users mailing list

Gluster-users@xxxxxxxxxxx

http://supercolony.gluster.org/mailman/listinfo/gluster-users

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://supercolony.gluster.org/mailman/listinfo/gluster-users