replace-brick commit force fails in multi node cluster

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



While writing a test for the patch fix of BZ https://bugzilla.redhat.com/show_bug.cgi?id=1560957 I just can't make my test case to pass where a replace brick commit force always fails on a multi node cluster and that's on the latest mainline code.

The fix is a one liner:

atin@dhcp35-96:~/codebase/upstream/glusterfs_master/glusterfs$ gd HEAD~1
diff --git a/xlators/mgmt/glusterd/src/glusterd-utils.c b/xlators/mgmt/glusterd/src/glusterd-utils.c
index af30756c9..24d813fbd 100644
--- a/xlators/mgmt/glusterd/src/glusterd-utils.c
+++ b/xlators/mgmt/glusterd/src/glusterd-utils.c
@@ -5995,6 +5995,7 @@ glusterd_brick_start (glusterd_volinfo_t *volinfo,
                          * TBD: re-use RPC connection across bricks
                          */
                         if (is_brick_mx_enabled ()) {
+                                brickinfo->port_registered = _gf_true;
                                 ret = glusterd_get_sock_from_brick_pid (pid, socketpath,
                                                                         sizeof(socketpath));
                                 if (ret) {




The test does the following:

#!/bin/bash                                                                       
                                                                                  
. $(dirname $0)/../../include.rc                                                  
. $(dirname $0)/../../cluster.rc                                                  
. $(dirname $0)/../../volume.rc                                                   
                                                                                  
                                                                                  
cleanup;                                                                          
                                                                                  
TEST launch_cluster 3;                                                            
                                                                                  
TEST $CLI_1 peer probe $H2;                                                       
EXPECT_WITHIN $PROBE_TIMEOUT 1 peer_count                                         
                                                                                  
TEST $CLI_1 peer probe $H3;                                                       
EXPECT_WITHIN $PROBE_TIMEOUT 2 peer_count                                         
                                                                                  
TEST $CLI_1 volume set all cluster.brick-multiplex on                             
                                                                                  
TEST $CLI_1 volume create $V0 replica 3 $H1:$B1/${V0}1 $H2:$B2/${V0}1 $H3:$B3/${V0}1
                                                                                  
TEST $CLI_1 volume start $V0                                                      
EXPECT_WITHIN $PROCESS_UP_TIMEOUT "1" brick_up_status_1 $V0 $H1 $B1/${V0}1        
EXPECT_WITHIN $PROCESS_UP_TIMEOUT "1" brick_up_status_1 $V0 $H2 $B2/${V0}1        
EXPECT_WITHIN $PROCESS_UP_TIMEOUT "1" brick_up_status_1 $V0 $H3 $B3/${V0}1        
                                                                                  
                                                                                  
#bug-1560957 - replace brick followed by an add-brick in a brick mux setup        
#brings down one brick instance                                                   
                                                                                  
kill_glusterd 3                                                                   
EXPECT_WITHIN $PROBE_TIMEOUT 1 peer_count                                         
TEST $CLI_1 volume replace-brick $V0 $H1:$B1/${V0}1 $H1:$B1/${V0}1_new commit force

this is where the test always fails saying "volume replace-brick: failed: Commit failed on localhost. Please check log file for details.
                                                                                  
TEST $glusterd_3                                                                  
EXPECT_WITHIN $PROBE_TIMEOUT 2 peer_count                                         
                                                                                  
TEST $CLI_1 volume add-brick $V0 replica 3 $H1:$$B1/${V0}3 $H2:$B1/${V0}3 $H3:$B1/${V0}3 commit force
                                                                                  
EXPECT_WITHIN $PROCESS_UP_TIMEOUT "1" brick_up_status_1 $V0 $H3 $H3:$B1/${V0}1 
cleanup;  

glusterd log from 1st node
[2018-03-27 13:11:58.630845] E [MSGID: 106053] [glusterd-utils.c:13889:glusterd_handle_replicate_brick_ops] 0-management: Failed to set extended attribute trusted.replace-brick : Transport endpoint is not connected [Transport endpoint is not connected]

Request some help/attention from AFR folks.
_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-devel

[Index of Archives]     [Gluster Users]     [Ceph Users]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux