Hi,Rebalance will abort itself if it cannot reach any of the nodes. Are all the bricks still up and reachable?Regards,Nithya
Yes the bricks appear to be fine. I restarted the rebalance and the process is moving along again:
# gluster vol rebalance tank status
Node Rebalanced-files size scanned failures skipped status run time in h:m:s
--------- ----------- ----------- ----------- ----------- ----------- ------------ --------------
localhost 226973 14.9TB 1572952 0 0 in progress 44:26:48
serverB 0 0Bytes 631667 0 0 completed 37:2:14
volume rebalance: tank: success
Node Rebalanced-files size scanned failures skipped status run time in h:m:s
--------- ----------- ----------- ----------- ----------- ----------- ------------ --------------
localhost 226973 14.9TB 1572952 0 0 in progress 44:26:48
serverB 0 0Bytes 631667 0 0 completed 37:2:14
volume rebalance: tank: success
# df -hP |grep data
/dev/mapper/gluster_vg-gluster_lv1_data 60T 24T 36T 40% /gluster_bricks/data1
/dev/mapper/gluster_vg-gluster_lv2_data 60T 24T 36T 40% /gluster_bricks/data2
/dev/mapper/gluster_vg-gluster_lv3_data 60T 17T 43T 29% /gluster_bricks/data3
/dev/mapper/gluster_vg-gluster_lv4_data 60T 17T 43T 29% /gluster_bricks/data4
/dev/mapper/gluster_vg-gluster_lv5_data 60T 19T 41T 31% /gluster_bricks/data5
/dev/mapper/gluster_vg-gluster_lv6_data 60T 19T 41T 31% /gluster_bricks/data6
/dev/mapper/gluster_vg-gluster_lv1_data 60T 24T 36T 40% /gluster_bricks/data1
/dev/mapper/gluster_vg-gluster_lv2_data 60T 24T 36T 40% /gluster_bricks/data2
/dev/mapper/gluster_vg-gluster_lv3_data 60T 17T 43T 29% /gluster_bricks/data3
/dev/mapper/gluster_vg-gluster_lv4_data 60T 17T 43T 29% /gluster_bricks/data4
/dev/mapper/gluster_vg-gluster_lv5_data 60T 19T 41T 31% /gluster_bricks/data5
/dev/mapper/gluster_vg-gluster_lv6_data 60T 19T 41T 31% /gluster_bricks/data6
Thanks,
HB
# gluster vol rebalance tank status
Node Rebalanced-files size scanned failures skipped status run time in h:m:s
--------- ----------- ----------- ----------- ----------- ----------- ------------ --------------
localhost 1348706 57.8TB 2234439 9 6 failed 190:24:3
serverB 0 0Bytes 7 0 0 completed 63:47:55
volume rebalance: tank: success# gluster vol status tank
Status of volume: tank
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick serverA:/gluster_bricks/data1 49162 0 Y 20318
Brick serverB:/gluster_bricks/data1 49166 0 Y 3432
Brick serverA:/gluster_bricks/data2 49163 0 Y 20323
Brick serverB:/gluster_bricks/data2 49167 0 Y 3435
Brick serverA:/gluster_bricks/data3 49164 0 Y 4625
Brick serverA:/gluster_bricks/data4 49165 0 Y 4644
Brick serverA:/gluster_bricks/data5 49166 0 Y 5088
Brick serverA:/gluster_bricks/data6 49167 0 Y 5128
Brick serverB:/gluster_bricks/data3 49168 0 Y 22314
Brick serverB:/gluster_bricks/data4 49169 0 Y 22345
Brick serverB:/gluster_bricks/data5 49170 0 Y 22889
Brick serverB:/gluster_bricks/data6 49171 0 Y 22932
Self-heal Daemon on localhost N/A N/A Y 6202
Self-heal Daemon on serverB N/A N/A Y 22981
Task Status of Volume tank
------------------------------------------------------------------------------
Task : Rebalance
ID : eec64343-8e0d-4523-ad05-5678f9eb9eb2
Status : failed# df -hP |grep data
/dev/mapper/gluster_vg-gluster_lv1_data 60T 31T 29T 52% /gluster_bricks/data1
/dev/mapper/gluster_vg-gluster_lv2_data 60T 31T 29T 51% /gluster_bricks/data2
/dev/mapper/gluster_vg-gluster_lv3_data 60T 15T 46T 24% /gluster_bricks/data3
/dev/mapper/gluster_vg-gluster_lv4_data 60T 15T 46T 24% /gluster_bricks/data4
/dev/mapper/gluster_vg-gluster_lv5_data 60T 15T 45T 25% /gluster_bricks/data5
/dev/mapper/gluster_vg-gluster_lv6_data 60T 15T 45T 25% /gluster_bricks/data6The rebalance log on serverA shows a disconnect from serverB[2019-09-08 15:41:44.285591] C [rpc-clnt-ping.c:160:rpc_clnt_ping_timer_expired] 0-tank-client-10: server <serverB>:49170 has not responded in the last 42 seconds, disconnecting.
[2019-09-08 15:41:44.285739] I [MSGID: 114018] [client.c:2280:client_rpc_notify] 0-tank-client-10: disconnected from tank-client-10. Client process will keep trying to connect to glusterd until brick's port is available
[2019-09-08 15:41:44.286023] E [rpc-clnt.c:365:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x192)[0x7ff986e8b132] (--> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7ff986c5299e] (--> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7ff986c52aae] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x90)[0x7ff986c54220] (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x2b0)[0x7ff986c54ce0] ))))) 0-tank-client-10: forced unwinding frame type(GlusterFS 3.3) op(FXATTROP(34)) called at 2019-09-08 15:40:44.040333 (xid=0x7f8cfac)Does this type of failure cause data corruption? What is the best course of action at this point?Thanks,HB_______________________________________________On Wed, Sep 11, 2019 at 11:58 PM Strahil <hunter86_bg@xxxxxxxxx> wrote:Hi Nithya,
Thanks for the detailed explanation.
It makes sense.Best Regards,
Strahil NikolovOn Sep 12, 2019 08:18, Nithya Balachandran <nbalacha@xxxxxxxxxx> wrote:On Wed, 11 Sep 2019 at 09:47, Strahil <hunter86_bg@xxxxxxxxx> wrote:Hi Nithya,
I just reminded about your previous e-mail which left me with the impression that old volumes need that.
This is the one 1 mean:>It looks like this is a replicate volume. If >that is the case then yes, you are >running an old version of Gluster for >which this was the default
Hi Strahil,I'm providing a little more detail here which I hope will explain things.Rebalance was always a volume wide operation - a rebalance start operation will start rebalance processes on all nodes of the volume. However, different processes would behave differently. In earlier releases, all nodes would crawl the bricks and update the directory layouts. However, only one node in each replica/disperse set would actually migrate files,so the rebalance status would only show one node doing any "work" (scanning, rebalancing etc). However, this one node will process all the files in its replica sets. Rerunning rebalance on other nodes would make no difference as it will always be the same node that ends up migrating files.So for instance, for a replicate volume with server1:/brick1, server2:/brick2 and server3:/brick3 in that order, only the rebalance process on server1 would migrate files. In newer releases, all 3 nodes would migrate files.The rebalance status does not capture the directory operations of fixing layouts which is why it looks like the other nodes are not doing anything.Hope this helps.Regards,Nithyabehaviour.
>
>
>Regards,
>
>Nithya
Best Regards,
Strahil NikolovOn Sep 9, 2019 06:36, Nithya Balachandran <nbalacha@xxxxxxxxxx> wrote:On Sat, 7 Sep 2019 at 00:03, Strahil Nikolov <hunter86_bg@xxxxxxxxx> wrote:As it was mentioned, you might have to run rebalance on the other node - but it is better to wait this node is over.Hi Strahil,Rebalance does not need to be run on the other node - the operation is a volume wide one . Only a single node per replica set would migrate files in the version used in this case .Regards,NithyaBest Regards,Strahil NikolovВ петък, 6 септември 2019 г., 15:29:20 ч. Гринуич+3, Herb Burnswell <herbert.burnswell@xxxxxxxxx>
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users
________ Community Meeting Calendar: APAC Schedule - Every 2nd and 4th Tuesday at 11:30 AM IST Bridge: https://bluejeans.com/118564314 NA/EMEA Schedule - Every 1st and 3rd Tuesday at 01:00 PM EDT Bridge: https://bluejeans.com/118564314 Gluster-users mailing list Gluster-users@xxxxxxxxxxx https://lists.gluster.org/mailman/listinfo/gluster-users