Re: halo not work as desired!!!

atris adam <atris.adam@xxxxxxxxx> · Mon, 5 Feb 2018 16:04:24 +0330

I have mounted the halo glusterfs volume in debug mode, and the output is as follows:
.
.
.
[2018-02-05 11:42:48.282473] D [rpc-clnt-ping.c:211:rpc_clnt_ping_cbk] 0-test-halo-client-1: Ping latency is 0ms
[2018-02-05 11:42:48.282502] D [MSGID: 0] [afr-common.c:5025:afr_get_halo_latency] 0-test-halo-replicate-0: Using halo latency 10
[2018-02-05 11:42:48.282525] D [MSGID: 0] [afr-common.c:4820:__afr_handle_ping_event] 0-test-halo-client-1: Client ping @ 140032933708544 ms
.
.
.
[2018-02-05 11:42:48.393776] D [MSGID: 0] [afr-common.c:4803:find_worst_up_child] 0-test-halo-replicate-0: Found worst up child (1) @ 140032933708544 ms latency
[2018-02-05 11:42:48.393803] D [MSGID: 0] [afr-common.c:4903:__afr_handle_child_up_event] 0-test-halo-replicate-0: Marking child 1 down, doesn't meet halo threshold (10), and > halo_min_replicas (2)
.
.
.

I think these debug output means:
As the ping time for test-halo-client-1 (brick2) is (0.5ms) and it is not under halo threshold (10 ms), this false decision for selecting bricks happen to halo.
I can not set the halo threshold to 0 because:

#gluster vol set test-halo cluster.halo-max-latency 0
volume set: failed: '0' in 'option halo-max-latency 0' is out of range [1 - 99999]

so I think the range  [1 - 99999] should change to  [0 - 99999], so I can get the desired brick selection for halo feature, am I right? If not, why the halo decide to mark down the best brick which has ping time bellow 0.5ms?

On Sun, Feb 4, 2018 at 2:27 PM, atris adam <atris.adam@xxxxxxxxx> wrote:
I have 2 data centers in two different region, each DC have 3 severs, I have created glusterfs volume with 4 replica, this is glusterfs volume info output:

Volume Name: test-halo
Type: Replicate
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 4 = 4
Transport-type: tcp
Bricks:
Brick1: 10.0.0.1:/mnt/test1
Brick2: 10.0.0.3:/mnt/test2
Brick3: 10.0.0.5:/mnt/test3
Brick4: 10.0.0.6:/mnt/test4
Options Reconfigured:
cluster.halo-shd-max-latency: 5
cluster.halo-max-latency: 10
cluster.quorum-count: 2
cluster.quorum-type: fixed
cluster.halo-enabled: yes
transport.address-family: inet
nfs.disable: on

bricks with ip 10.0.0.1 & 10.0.0.3 are in region A and bricks with ip 10.0.0.5 & 10.0.0.6 are in region B

when I mount the volume in region A, I except the data first store in brick1 & brick2, then asynchronously the data copies in region B, on brick3 & brick4.

Am I write? this is what halo claims?

If yes, unfortunately, this not happen to me, no differ I mount the volume in region A or mount the volume in region B, all the data are copied in brick3 & brick4 and no data copies in brick1 & brick2.

ping bricks ip from region A is as follows:
ping 10.0.0.1 & 10.0.0.3 are bellow  time=0.500 ms

ping 10.0.0.5 & 10.0.0.6 are more than  time=20 ms

What is the logic that the halo select the bricks to write to?if it is the access time, so when I mount the volume in region A, the ping time to brick1 & brick2 is bellow 0.5 ms, but the halo select the brick3 & brick4!!!!

glusterfs version is:
glusterfs 3.12.4

I really need to work with halo feature, But I am not successful to run this case, Can anyone help me soon??

Thx alot

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-users