Re: Gluster 5.10 rebalance stuck

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Gluster Dev's,
Any leads on the above? We are kinda stuck at the moment.

On Mon, Nov 7, 2022 at 2:13 PM Strahil Nikolov <hunter86_bg@xxxxxxxxx> wrote:
Hi Dev list,

How can I find the details about the rebalance_status/status ids ? Is it actually normal that some systems are in '4' , others in '3' ?

Is it safe to forcefully start a new rebalance ?

Best Regards,
Strahil Nikolov 

On Mon, Nov 7, 2022 at 9:15, Shreyansh Shah
Hi Strahil,
Adding the info below:

--------------------------------------
Node IP = 10.132.0.19
rebalance_status=1
status=4
rebalance_op=19
rebalance-id=39a89b51-2549-4348-aa47-0db321c3a32f
rebalanced-files=27054
size=7104425578505
scanned=72141
failures=10
skipped=19611
run-time=92805.000000
--------------------------------------
Node IP = 10.132.0.20
rebalance_status=1
status=4
rebalance_op=19
rebalance-id=39a89b51-2549-4348-aa47-0db321c3a32f
rebalanced-files=23945
size=7126809216060
scanned=71208
failures=7
skipped=18834
run-time=94029.000000
--------------------------------------
Node IP = 10.132.1.12
rebalance_status=1
status=4
rebalance_op=19
rebalance-id=39a89b51-2549-4348-aa47-0db321c3a32f
rebalanced-files=12533
size=12945021256
scanned=40398
failures=14
skipped=1194
run-time=92201.000000
--------------------------------------
Node IP = 10.132.1.13
rebalance_status=1
status=3
rebalance_op=19
rebalance-id=39a89b51-2549-4348-aa47-0db321c3a32f
rebalanced-files=41483
size=8845076025598
scanned=179920
failures=25
skipped=62373
run-time=130017.000000
--------------------------------------
Node IP = 10.132.1.14
rebalance_status=1
status=3
rebalance_op=19
rebalance-id=39a89b51-2549-4348-aa47-0db321c3a32f
rebalanced-files=43603
size=7834691799355
scanned=204140
failures=2878
skipped=87761
run-time=130016.000000
--------------------------------------
Node IP = 10.132.1.15
rebalance_status=1
status=4
rebalance_op=19
rebalance-id=39a89b51-2549-4348-aa47-0db321c3a32f
rebalanced-files=29968
size=6389568855140
scanned=69320
failures=7
skipped=17999
run-time=93654.000000
--------------------------------------
Node IP = 10.132.1.16
rebalance_status=1
status=4
rebalance_op=19
rebalance-id=39a89b51-2549-4348-aa47-0db321c3a32f
rebalanced-files=23226
size=5899338197718
scanned=56169
failures=7
skipped=12659
run-time=94030.000000
--------------------------------------
Node IP = 10.132.1.17
rebalance_status=1
status=4
rebalance_op=19
rebalance-id=39a89b51-2549-4348-aa47-0db321c3a32f
rebalanced-files=17538
size=6247281008602
scanned=50038
failures=8
skipped=11335
run-time=92203.000000
--------------------------------------
Node IP = 10.132.1.18
rebalance_status=1
status=4
rebalance_op=19
rebalance-id=39a89b51-2549-4348-aa47-0db321c3a32f
rebalanced-files=20394
size=6395008466977
scanned=50060
failures=7
skipped=13784
run-time=92103.000000
--------------------------------------
Node IP = 10.132.1.19
rebalance_status=1
status=1
rebalance_op=19
rebalance-id=39a89b51-2549-4348-aa47-0db321c3a32f
rebalanced-files=0
size=0
scanned=0
failures=0
skipped=0
run-time=0.000000
--------------------------------------
Node IP = 10.132.1.20
rebalance_status=1
status=3
rebalance_op=19
rebalance-id=39a89b51-2549-4348-aa47-0db321c3a32f
rebalanced-files=0
size=0
scanned=24
failures=0
skipped=2
run-time=1514.000000

On Thu, Nov 3, 2022 at 10:10 PM Strahil Nikolov <hunter86_bg@xxxxxxxxx> wrote:
And the other servers ?

On Thu, Nov 3, 2022 at 16:21, Shreyansh Shah
Hi Strahil,
Thank you for your reply. node_state.info has the below data

root@gluster-11:/usr/var/lib/glusterd/vols/data# cat node_state.info
rebalance_status=1
status=3
rebalance_op=19
rebalance-id=39a89b51-2549-4348-aa47-0db321c3a32f
rebalanced-files=0
size=0
scanned=24
failures=0
skipped=2
run-time=1514.000000



On Thu, Nov 3, 2022 at 4:00 PM Strahil Nikolov <hunter86_bg@xxxxxxxxx> wrote:
I would check the details in /var/lib/glusterd/vols/<VOLUME_NAME>/node_state.info

Best Regards,
Strahil Nikolov 

On Wed, Nov 2, 2022 at 9:06, Shreyansh Shah
Hi,
I Would really appreciate it if someone would be able to help on the above issue. We are stuck as we cannot run rebalance due to this and thus are not able to extract peak performance from the setup due to unbalanced data.
Adding gluster info (without the bricks) below. Please let me know if any other details/logs are needed.

Volume Name: data
Type: Distribute
Volume ID: 75410231-bb25-4f14-bcde-caf18fce1d31
Status: Started
Snapshot Count: 0
Number of Bricks: 41
Transport-type: tcp
Options Reconfigured:
server.event-threads: 4
network.ping-timeout: 90
client.keepalive-time: 60
server.keepalive-time: 60
storage.health-check-interval: 60
performance.client-io-threads: on
nfs.disable: on
transport.address-family: inet
performance.cache-size: 8GB
performance.cache-refresh-timeout: 60
cluster.min-free-disk: 3%
client.event-threads: 4
performance.io-thread-count: 16


On Fri, Oct 28, 2022 at 11:40 AM Shreyansh Shah <shreyansh.shah@xxxxxxxxxxxxxx> wrote:
Hi,
We are running glusterfs 5.10 server volume. Recently we added a few new bricks and started a rebalance operation. After a couple of days the rebalance operation was just stuck, with one of the peers showing In-Progress with no file being read/transferred and the rest showing Failed/Completed, so we stopped it using "gluster volume rebalance data stop". Now when we are trying to start it again, we get the below error. Any assistance would be appreciated

root@gluster-11:~# gluster volume rebalance data status
volume rebalance: data: failed: Rebalance not started for volume data.
root@gluster-11:~# gluster volume rebalance data start
volume rebalance: data: failed: Rebalance on data is already started
root@gluster-11:~# gluster volume rebalance data stop
volume rebalance: data: failed: Rebalance not started for volume data.
 
--
Regards,
Shreyansh Shah

AlphaGrep Securities Pvt. Ltd.


--
Regards,
Shreyansh Shah

AlphaGrep Securities Pvt. Ltd.


--
Regards,
Shreyansh Shah

AlphaGrep Securities Pvt. Ltd.


--
Regards,
Shreyansh Shah

AlphaGrep Securities Pvt. Ltd.


--
Regards,
Shreyansh Shah

AlphaGrep Securities Pvt. Ltd.
-------

Community Meeting Calendar:
Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk

Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-devel


[Index of Archives]     [Gluster Users]     [Ceph Users]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux