Re: Is rebalance in progress or not?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 3/15/20 11:07 AM, Strahil Nikolov wrote:
On March 15, 2020 11:50:32 AM GMT+02:00, Alexander Iliev <ailiev+gluster@xxxxxxxxx> wrote:
Hi list,

I was having some issues with one of my Gluster nodes so I ended up
re-installing it. Now I want to re-add the bricks for my main volume
and
I'm having the following issue - when I try to add the bricks I get:

# gluster volume add-brick store1 replica 3 <bricks ...>
volume add-brick: failed: Pre Validation failed on 172.31.35.132.
Volume name store1 rebalance is in progress. Please retry after
completion

But then if I get the rebalance status I get:

# gluster volume rebalance store1 status
volume rebalance: store1: failed: Rebalance not started for volume
store1.

And if I try to start the rebalancing I get:

# gluster volume rebalance store1 start
volume rebalance: store1: failed: Rebalance on store1 is already
started

Looking at the logs of the first node, when I try to start the
rebalance
operation I see this:

[2020-03-15 09:41:31.883651] E [MSGID: 106276]
[glusterd-rpc-ops.c:1200:__glusterd_stage_op_cbk] 0-management:
Received
stage RJT from uuid: 9476b8bb-d7ee-489a-b083-875805343e67

On the second node the logs are showing stuff that indicates that a
rebalance operation is indeed in progress:

[2020-03-15 09:47:34.190042] I [MSGID: 109081]
[dht-common.c:5868:dht_setxattr] 0-store1-dht: fixing the layout of
/redacted
[2020-03-15 09:47:34.775691] I
[dht-rebalance.c:3285:gf_defrag_process_dir] 0-store1-dht: migrate data

called on /redacted
[2020-03-15 09:47:36.019403] I
[dht-rebalance.c:3480:gf_defrag_process_dir] 0-store1-dht: Migration
operation on dir /redacted took 1.24 secs


Some background on what led to this situation:

The volume was originally a replica 3 distributed replicated volume on
three nodes. In order to detach the faulty node I lowered the replica
count to 2 and removed the bricks from that node from the volume. I
cleaned up the storage (formatted the bricks and cleaned the
trusted.gfid and trusted.glusterfs.volume-id extended attributes) and
purged the gluster packages from the system, then I re-installed the
gluster packages and did a `gluster peer probe` from another node.

I'm running Gluster 6.6 on CentOS 7.7 on all nodes.

I feel stuck at this point, so any guidance will be greatly
appreciated.

Thanks!

Best regards,

Hey  Alex,

Did you try to  go the second node  (the  one tgat  thinks  balance  is running)  and stop tge balance ?

gluster volume rebalance VOLNAME stop

Then add the new brick (and  increase  the  replica  count) and after  the  heal is over - rebalance again.

Hey Strahil,

Thanks for the suggestion, I just tried it, but unfortunately the result is pretty much the same - when I try to stop the rebalance on the second node it reports that no rebalance is in progress:

> # gluster volume rebalance store1 stop
> volume rebalance: store1: failed: Rebalance not started for volume store1.


Best Regards,
Strahil Nikolov


Best regards,
--
alexander iliev
________



Community Meeting Calendar:

Schedule -
Every Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users



[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux