OK, so the log just hints to the following:
[2017-07-05 15:04:07.178204] E [MSGID: 106123] [glusterd-mgmt.c:1532:glusterd_mgmt_v3_commit] 0-management: Commit failed for operation Reset Brick on local node
[2017-07-05 15:04:07.178214] E [MSGID: 106123] [glusterd-replace-brick.c:649:glusterd_mgmt_v3_initiate_replace_brick_cmd_phases] 0-management: Commit Op Failed
While going through the code, glusterd_op_reset_brick () failed resulting into these logs. Now I don't see any error logs generated from glusterd_op_reset_brick () which makes me thing that have we failed from a place where we log the failure in debug mode. Would you be able to restart glusterd service with debug log mode and reran this test and share the log?[2017-07-05 15:04:07.178204] E [MSGID: 106123] [glusterd-mgmt.c:1532:glusterd_mgmt_v3_commit] 0-management: Commit failed for operation Reset Brick on local node
[2017-07-05 15:04:07.178214] E [MSGID: 106123] [glusterd-replace-brick.c:649:glusterd_mgmt_v3_initiate_replace_brick_cmd_phases] 0-management: Commit Op Failed
On Wed, Jul 5, 2017 at 9:12 PM, Gianluca Cecchi <gianluca.cecchi@xxxxxxxxx> wrote:
On Wed, Jul 5, 2017 at 5:22 PM, Atin Mukherjee <amukherj@xxxxxxxxxx> wrote:And what does glusterd log indicate for these failures?See here in gzip formatIt seems that on each host the peer files have been updated with a new entry "hostname2":[root@ovirt01 ~]# cat /var/lib/glusterd/peers/*uuid=b89311fe-257f-4e44-8e15-9bff6245d689 state=3hostname1=ovirt02.localdomain.local hostname2=10.10.2.103uuid=ec81a04c-a19c-4d31-9d82-7543cefe79f3 state=3hostname1=ovirt03.localdomain.local hostname2=10.10.2.104[root@ovirt01 ~]#[root@ovirt02 ~]# cat /var/lib/glusterd/peers/*uuid=e9717281-a356-42aa-a579-a4647a29a0bc state=3hostname1=ovirt01.localdomain.local hostname2=10.10.2.102uuid=ec81a04c-a19c-4d31-9d82-7543cefe79f3 state=3hostname1=ovirt03.localdomain.local hostname2=10.10.2.104[root@ovirt02 ~]#[root@ovirt03 ~]# cat /var/lib/glusterd/peers/*uuid=b89311fe-257f-4e44-8e15-9bff6245d689 state=3hostname1=ovirt02.localdomain.local hostname2=10.10.2.103uuid=e9717281-a356-42aa-a579-a4647a29a0bc state=3hostname1=ovirt01.localdomain.local hostname2=10.10.2.102[root@ovirt03 ~]#But not the gluster info on the second and third node that have lost the ovirt01/gl01 host brick information...Eg on ovirt02[root@ovirt02 peers]# gluster volume info exportVolume Name: exportType: ReplicateVolume ID: b00e5839-becb-47e7-844f-6ce6ce1b7153 Status: StartedSnapshot Count: 0Number of Bricks: 0 x (2 + 1) = 2Transport-type: tcpBricks:Brick1: ovirt02.localdomain.local:/gluster/brick3/export Brick2: ovirt03.localdomain.local:/gluster/brick3/export Options Reconfigured:transport.address-family: inetperformance.readdir-ahead: onperformance.quick-read: offperformance.read-ahead: offperformance.io-cache: offperformance.stat-prefetch: offcluster.eager-lock: enablenetwork.remote-dio: offcluster.quorum-type: autocluster.server-quorum-type: serverstorage.owner-uid: 36storage.owner-gid: 36features.shard: onfeatures.shard-block-size: 512MBperformance.low-prio-threads: 32cluster.data-self-heal-algorithm: full cluster.locking-scheme: granularcluster.shd-wait-qlength: 10000cluster.shd-max-threads: 6network.ping-timeout: 30user.cifs: offnfs.disable: onperformance.strict-o-direct: on[root@ovirt02 peers]#And on ovirt03[root@ovirt03 ~]# gluster volume info exportVolume Name: exportType: ReplicateVolume ID: b00e5839-becb-47e7-844f-6ce6ce1b7153 Status: StartedSnapshot Count: 0Number of Bricks: 0 x (2 + 1) = 2Transport-type: tcpBricks:Brick1: ovirt02.localdomain.local:/gluster/brick3/export Brick2: ovirt03.localdomain.local:/gluster/brick3/export Options Reconfigured:transport.address-family: inetperformance.readdir-ahead: onperformance.quick-read: offperformance.read-ahead: offperformance.io-cache: offperformance.stat-prefetch: offcluster.eager-lock: enablenetwork.remote-dio: offcluster.quorum-type: autocluster.server-quorum-type: serverstorage.owner-uid: 36storage.owner-gid: 36features.shard: onfeatures.shard-block-size: 512MBperformance.low-prio-threads: 32cluster.data-self-heal-algorithm: full cluster.locking-scheme: granularcluster.shd-wait-qlength: 10000cluster.shd-max-threads: 6network.ping-timeout: 30user.cifs: offnfs.disable: onperformance.strict-o-direct: on[root@ovirt03 ~]#While on ovirt01 it seems isolated...[root@ovirt01 ~]# gluster volume info exportVolume Name: exportType: ReplicateVolume ID: b00e5839-becb-47e7-844f-6ce6ce1b7153 Status: StartedSnapshot Count: 0Number of Bricks: 0 x (2 + 1) = 1Transport-type: tcpBricks:Brick1: gl01.localdomain.local:/gluster/brick3/export Options Reconfigured:transport.address-family: inetperformance.readdir-ahead: onperformance.quick-read: offperformance.read-ahead: offperformance.io-cache: offperformance.stat-prefetch: offcluster.eager-lock: enablenetwork.remote-dio: offcluster.quorum-type: autocluster.server-quorum-type: serverstorage.owner-uid: 36storage.owner-gid: 36features.shard: onfeatures.shard-block-size: 512MBperformance.low-prio-threads: 32cluster.data-self-heal-algorithm: full cluster.locking-scheme: granularcluster.shd-wait-qlength: 10000cluster.shd-max-threads: 6network.ping-timeout: 30user.cifs: offnfs.disable: onperformance.strict-o-direct: on[root@ovirt01 ~]#
_______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://lists.gluster.org/mailman/listinfo/gluster-users