On March 15, 2020 12:16:51 PM GMT+02:00, Alexander Iliev <ailiev+gluster@xxxxxxxxx> wrote: >On 3/15/20 11:07 AM, Strahil Nikolov wrote: >> On March 15, 2020 11:50:32 AM GMT+02:00, Alexander Iliev ><ailiev+gluster@xxxxxxxxx> wrote: >>> Hi list, >>> >>> I was having some issues with one of my Gluster nodes so I ended up >>> re-installing it. Now I want to re-add the bricks for my main volume >>> and >>> I'm having the following issue - when I try to add the bricks I get: >>> >>>> # gluster volume add-brick store1 replica 3 <bricks ...> >>>> volume add-brick: failed: Pre Validation failed on 172.31.35.132. >>> Volume name store1 rebalance is in progress. Please retry after >>> completion >>> >>> But then if I get the rebalance status I get: >>> >>>> # gluster volume rebalance store1 status >>>> volume rebalance: store1: failed: Rebalance not started for volume >>> store1. >>> >>> And if I try to start the rebalancing I get: >>> >>>> # gluster volume rebalance store1 start >>>> volume rebalance: store1: failed: Rebalance on store1 is already >>> started >>> >>> Looking at the logs of the first node, when I try to start the >>> rebalance >>> operation I see this: >>> >>>> [2020-03-15 09:41:31.883651] E [MSGID: 106276] >>> [glusterd-rpc-ops.c:1200:__glusterd_stage_op_cbk] 0-management: >>> Received >>> stage RJT from uuid: 9476b8bb-d7ee-489a-b083-875805343e67 >>> >>> On the second node the logs are showing stuff that indicates that a >>> rebalance operation is indeed in progress: >>> >>>> [2020-03-15 09:47:34.190042] I [MSGID: 109081] >>> [dht-common.c:5868:dht_setxattr] 0-store1-dht: fixing the layout of >>> /redacted >>>> [2020-03-15 09:47:34.775691] I >>> [dht-rebalance.c:3285:gf_defrag_process_dir] 0-store1-dht: migrate >data >>> >>> called on /redacted >>>> [2020-03-15 09:47:36.019403] I >>> [dht-rebalance.c:3480:gf_defrag_process_dir] 0-store1-dht: Migration >>> operation on dir /redacted took 1.24 secs >>> >>> >>> Some background on what led to this situation: >>> >>> The volume was originally a replica 3 distributed replicated volume >on >>> three nodes. In order to detach the faulty node I lowered the >replica >>> count to 2 and removed the bricks from that node from the volume. I >>> cleaned up the storage (formatted the bricks and cleaned the >>> trusted.gfid and trusted.glusterfs.volume-id extended attributes) >and >>> purged the gluster packages from the system, then I re-installed the >>> gluster packages and did a `gluster peer probe` from another node. >>> >>> I'm running Gluster 6.6 on CentOS 7.7 on all nodes. >>> >>> I feel stuck at this point, so any guidance will be greatly >>> appreciated. >>> >>> Thanks! >>> >>> Best regards, >> >> Hey Alex, >> >> Did you try to go the second node (the one tgat thinks balance >is running) and stop tge balance ? >> >> gluster volume rebalance VOLNAME stop >> >> Then add the new brick (and increase the replica count) and after > the heal is over - rebalance again. > >Hey Strahil, > >Thanks for the suggestion, I just tried it, but unfortunately the >result >is pretty much the same - when I try to stop the rebalance on the >second >node it reports that no rebalance is in progress: > > > # gluster volume rebalance store1 stop > > volume rebalance: store1: failed: Rebalance not started for volume >store1. > >> >> Best Regards, >> Strahil Nikolov >> > >Best regards, >-- >alexander iliev Hey Alex, I'm not sure if the command has a 'force' flag, but of it does - it is worth trying. gluster volume rebalance store1 stop force Sadly, as the second node thinks balance is running - I'm not sure if a 'start force' (to convince both nodes that balance is runking )and then 'stop' will have the expected effect. Sadly, this situation is hard to reproduce. In any way , a bug report should be opened . Keep in mind that I do not have a distributed volume , so everything above is pure speculation. Based on my experience - a gluster upgrade can fix odd situations like that, but also it could make things worse . So for now avoid any upgrades, until a dev confirms it is safe to do. Best Regards, Strahil Nikolov ________ Community Meeting Calendar: Schedule - Every Tuesday at 14:30 IST / 09:00 UTC Bridge: https://bluejeans.com/441850968 Gluster-users mailing list Gluster-users@xxxxxxxxxxx https://lists.gluster.org/mailman/listinfo/gluster-users