Rebalance Issues

Shreyansh Shah <shreyansh.shah@xxxxxxxxxxxxxx> · Fri, 12 Nov 2021 12:01:02 +0530

Hi All,

I have a distributed glusterfs 5.10 setup with 8 nodes and each of them having 1 TB disk and 3 disk of 4TB each (so total 22 TB per node).
Recently I added  a new node with 3 additional disks (1 x 10TB + 2 x 8TB). Post this I ran rebalance and it does not seem to complete successfully (adding result of gluster volume rebalance data status below). On a few nodes it shows failed and on the node it is showing as completed the rebalance is not even.

root@gluster6-new:~# gluster v rebalance data status
                                     Node Rebalanced-files          size       scanned      failures       skipped               status  run time in h:m:s
                               ---------      -----------   -----------   -----------   -----------   -----------         ------------     --------------
                               localhost            22836         2.4TB        136149             1         27664          in progress       14:48:56
                             10.132.1.15               80         5.0MB          1134             3           121               failed        1:08:33
                             10.132.1.14            18573         2.5TB        137827            20         31278          in progress       14:48:56
                             10.132.1.12              607        61.3MB          1667             5            60               failed        1:08:33
      gluster4.c.storage-186813.internal            26479         2.8TB        148402            14         38271          in progress       14:48:56
                             10.132.1.18               86         6.4MB          1094             5            70               failed        1:08:33
                             10.132.1.17            21953         2.6TB        131573             4         26818          in progress       14:48:56
                             10.132.1.16               56        45.0MB          1203             5           111               failed        1:08:33
                             10.132.0.19             3108         1.9TB        224707             2        160148            completed       13:56:31
Estimated time left for rebalance to complete :       22:04:28

Adding 'df -h'  output for the node that has been marked as completed in the above status command, the data does not seem to be evenly balanced.
root@gluster-9:~$ df -h /data*
Filesystem      Size  Used Avail Use% Mounted on
/dev/bcache0     10T  8.9T  1.1T  90% /data
/dev/bcache1    8.0T  5.0T  3.0T  63% /data1
/dev/bcache2    8.0T  5.0T  3.0T  63% /data2

I would appreciate any help to identify the issues here:

1. Failures during rebalance.
2. Im-balance in data size post gluster rebalance command.
3. Another thing I would like to mention is that we had to re-balance twice as in the initial run one of the new disks on the new node (10 TB), got 100% full. Any thoughts as to why this could happen during rebalance? The disks on the new node were completely blank disks before rebalance.
4. Does glusterfs rebalance data based on percentage used or absolute free disk space available?

I can share more details/logs if required. Thanks.

-- 
Regards,
Shreyansh Shah
AlphaGrep Securities Pvt. Ltd.
________

Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users