Re: Rebalance times in 3.2.5 vs 3.4.2

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hopefully I'm not derailing this thread too far, but I have a related rebalance progress/speed issue.

I have a rebalance process started that's been running for 3-4 days.  Is there a good way to see if it's running successfully, or might this be a sign of some problem?

This is on a 4-node distribute setup with v3.4.2 and 45T of data.

The *-rebalance.log has been silent since some informational messages when the rebalance started.  There were a few initial warnings and errors that I observed, though:


E [client-handshake.c:1397:client_setvolume_cbk] 0-cluster2-client-0: SETVOLUME on remote-host failed: Authentication failed

W [client-handshake.c:1365:client_setvolume_cbk] 0-cluster2-client-4: failed to set the volume (Permission denied)

W [client-handshake.c:1391:client_setvolume_cbk] 0-cluster2-client-4: failed to get 'process-uuid' from reply dict

W [socket.c:514:__socket_rwv] 0-cluster2-client-3: readv failed (No data available)


"gluster volume status" reports that the rebalance is in progress, the process listed in vols/<volname>/rebalance/<hash>.pid is still running on the server, but "gluster volume rebalance <volname> status" reports 0 for everything (files scanned or rebalanced, failures, run time).

Thanks,

Matt



On Thu, Feb 27, 2014 at 12:39 AM, Shylesh Kumar <shmohan@xxxxxxxxxx> wrote:
Hi Viktor,

Lots of optimizations and improvements went in for 3.4 so it should be faster than 3.2.
Just to make sure what's happening could you please check rebalance logs which will be in
/var/log/glusterfs/<volname>-rebalance.log and check is there any progress ?

Thanks,
Shylesh


Viktor Villafuerte wrote:
Anybody can confirm/dispute that this is normal/abnormal?

v


On Tue 25 Feb 2014 15:21:40, Viktor Villafuerte wrote:
Hi all,

I have distributed replicated set with 2 servers (replicas) and am
trying to add another set of replicas: 1 x (1x1) => 2 x (1x1)

I have about 23G of data which I copy onto the first replica, check
everything and then add the other set of replicas and eventually
rebalance fix-layout, migrate-data.

Now on

Gluster v3.2.5 this took about 30 mins (to rebalance + migrate-data)

on

Gluster v3.4.2 this has been running for almost 4 hours and it's still
not finished


As I may have to do this in production, where the amount of data is
significantly larger than 23G, I'm looking at about three weeks of wait
to rebalance :)

Now my question is if this is as it's meant to be? I can see that v3.4.2
gives me more info about the rebalance process etc, but that surely
cannot justify the enormous time difference.

Is this normal/expected behaviour? If so I will have to stick with the
v3.2.5 as it seems way quicker.

Please, let me know if there is any 'well known' option/way/secret to
speed the rebalance up on v3.4.2.


thanks



--
Regards

Viktor Villafuerte
Optus Internet Engineering
t: 02 808-25265
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://supercolony.gluster.org/mailman/listinfo/gluster-users

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://supercolony.gluster.org/mailman/listinfo/gluster-users

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://supercolony.gluster.org/mailman/listinfo/gluster-users

[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux