While writing a test for the patch fix of BZ https://bugzilla.redhat.com/show_bug.cgi?id=1560957 I just can't make my test case to pass where a replace brick commit force always fails on a multi node cluster and that's on the latest mainline code.
The fix is a one liner:
atin@dhcp35-96:~/codebase/upstream/glusterfs_master/glusterfs$ gd HEAD~1
diff --git a/xlators/mgmt/glusterd/src/glusterd-utils.c b/xlators/mgmt/glusterd/src/glusterd-utils.c
index af30756c9..24d813fbd 100644
--- a/xlators/mgmt/glusterd/src/glusterd-utils.c
+++ b/xlators/mgmt/glusterd/src/glusterd-utils.c
@@ -5995,6 +5995,7 @@ glusterd_brick_start (glusterd_volinfo_t *volinfo,
* TBD: re-use RPC connection across bricks
*/
if (is_brick_mx_enabled ()) {
+ brickinfo->port_registered = _gf_true;
ret = glusterd_get_sock_from_brick_pid (pid, socketpath,
sizeof(socketpath));
if (ret) {
atin@dhcp35-96:~/codebase/upstream/glusterfs_master/glusterfs$ gd HEAD~1
diff --git a/xlators/mgmt/glusterd/src/glusterd-utils.c b/xlators/mgmt/glusterd/src/glusterd-utils.c
index af30756c9..24d813fbd 100644
--- a/xlators/mgmt/glusterd/src/glusterd-utils.c
+++ b/xlators/mgmt/glusterd/src/glusterd-utils.c
@@ -5995,6 +5995,7 @@ glusterd_brick_start (glusterd_volinfo_t *volinfo,
* TBD: re-use RPC connection across bricks
*/
if (is_brick_mx_enabled ()) {
+ brickinfo->port_registered = _gf_true;
ret = glusterd_get_sock_from_brick_pid (pid, socketpath,
sizeof(socketpath));
if (ret) {
The test does the following:
#!/bin/bash
. $(dirname $0)/../../include.rc
. $(dirname $0)/../../cluster.rc
. $(dirname $0)/../../volume.rc
cleanup;
TEST launch_cluster 3;
TEST $CLI_1 peer probe $H2;
EXPECT_WITHIN $PROBE_TIMEOUT 1 peer_count
TEST $CLI_1 peer probe $H3;
EXPECT_WITHIN $PROBE_TIMEOUT 2 peer_count
TEST $CLI_1 volume set all cluster.brick-multiplex on
TEST $CLI_1 volume create $V0 replica 3 $H1:$B1/${V0}1 $H2:$B2/${V0}1 $H3:$B3/${V0}1
TEST $CLI_1 volume start $V0
EXPECT_WITHIN $PROCESS_UP_TIMEOUT "1" brick_up_status_1 $V0 $H1 $B1/${V0}1
EXPECT_WITHIN $PROCESS_UP_TIMEOUT "1" brick_up_status_1 $V0 $H2 $B2/${V0}1
EXPECT_WITHIN $PROCESS_UP_TIMEOUT "1" brick_up_status_1 $V0 $H3 $B3/${V0}1
#bug-1560957 - replace brick followed by an add-brick in a brick mux setup
#brings down one brick instance
kill_glusterd 3
EXPECT_WITHIN $PROBE_TIMEOUT 1 peer_count
TEST $CLI_1 volume replace-brick $V0 $H1:$B1/${V0}1 $H1:$B1/${V0}1_new commit force
this is where the test always fails saying "volume replace-brick: failed: Commit failed on localhost. Please check log file for details.
TEST $glusterd_3
EXPECT_WITHIN $PROBE_TIMEOUT 2 peer_count
TEST $CLI_1 volume add-brick $V0 replica 3 $H1:$$B1/${V0}3 $H2:$B1/${V0}3 $H3:$B1/${V0}3 commit force
EXPECT_WITHIN $PROCESS_UP_TIMEOUT "1" brick_up_status_1 $V0 $H3 $H3:$B1/${V0}1
cleanup;
glusterd log from 1st node
[2018-03-27 13:11:58.630845] E [MSGID: 106053] [glusterd-utils.c:13889:glusterd_handle_replicate_brick_ops] 0-management: Failed to set extended attribute trusted.replace-brick : Transport endpoint is not connected [Transport endpoint is not connected]
_______________________________________________ Gluster-devel mailing list Gluster-devel@xxxxxxxxxxx http://lists.gluster.org/mailman/listinfo/gluster-devel