On March 15, 2020 11:50:32 AM GMT+02:00, Alexander Iliev <ailiev+gluster@xxxxxxxxx> wrote: >Hi list, > >I was having some issues with one of my Gluster nodes so I ended up >re-installing it. Now I want to re-add the bricks for my main volume >and >I'm having the following issue - when I try to add the bricks I get: > > > # gluster volume add-brick store1 replica 3 <bricks ...> > > volume add-brick: failed: Pre Validation failed on 172.31.35.132. >Volume name store1 rebalance is in progress. Please retry after >completion > >But then if I get the rebalance status I get: > > > # gluster volume rebalance store1 status > > volume rebalance: store1: failed: Rebalance not started for volume >store1. > >And if I try to start the rebalancing I get: > > > # gluster volume rebalance store1 start >> volume rebalance: store1: failed: Rebalance on store1 is already >started > >Looking at the logs of the first node, when I try to start the >rebalance >operation I see this: > > > [2020-03-15 09:41:31.883651] E [MSGID: 106276] >[glusterd-rpc-ops.c:1200:__glusterd_stage_op_cbk] 0-management: >Received >stage RJT from uuid: 9476b8bb-d7ee-489a-b083-875805343e67 > >On the second node the logs are showing stuff that indicates that a >rebalance operation is indeed in progress: > > > [2020-03-15 09:47:34.190042] I [MSGID: 109081] >[dht-common.c:5868:dht_setxattr] 0-store1-dht: fixing the layout of >/redacted > > [2020-03-15 09:47:34.775691] I >[dht-rebalance.c:3285:gf_defrag_process_dir] 0-store1-dht: migrate data > >called on /redacted > > [2020-03-15 09:47:36.019403] I >[dht-rebalance.c:3480:gf_defrag_process_dir] 0-store1-dht: Migration >operation on dir /redacted took 1.24 secs > > >Some background on what led to this situation: > >The volume was originally a replica 3 distributed replicated volume on >three nodes. In order to detach the faulty node I lowered the replica >count to 2 and removed the bricks from that node from the volume. I >cleaned up the storage (formatted the bricks and cleaned the >trusted.gfid and trusted.glusterfs.volume-id extended attributes) and >purged the gluster packages from the system, then I re-installed the >gluster packages and did a `gluster peer probe` from another node. > >I'm running Gluster 6.6 on CentOS 7.7 on all nodes. > >I feel stuck at this point, so any guidance will be greatly >appreciated. > >Thanks! > >Best regards, Hey Alex, Did you try to go the second node (the one tgat thinks balance is running) and stop tge balance ? gluster volume rebalance VOLNAME stop Then add the new brick (and increase the replica count) and after the heal is over - rebalance again. Best Regards, Strahil Nikolov ________ Community Meeting Calendar: Schedule - Every Tuesday at 14:30 IST / 09:00 UTC Bridge: https://bluejeans.com/441850968 Gluster-users mailing list Gluster-users@xxxxxxxxxxx https://lists.gluster.org/mailman/listinfo/gluster-users