I have checked more and mount the volume in another region (in region c), the ping time from region c is as follows:
ping 10.0.0.1 & 10.0.0.3 are bellow time=12 ms
ping 10.0.0.5 & 10.0.0.6 are more than time=32 msI expect the bricks with lower ping time to be selected at write time, but still the brick selection is not as desired and those bricks with more ping time are selected. I change the cluster.halo-max-latency to 20, but this not affect anything.
on more thing is, the previous email I wrote was not with the right result, I though that by changing the range to [0-999999] everything will be ok, but my today experience shows that I was wrong.
any help will be appreciated ;)
On Mon, Feb 5, 2018 at 4:04 PM, atris adam <atris.adam@xxxxxxxxx> wrote:
so I think the range [1 - 99999] should change to [0 - 99999], so I can get the desired brick selection for halo feature, am I right? If not, why the halo decide to mark down the best brick which has ping time bellow 0.5ms?I can not set the halo threshold to 0 because:I have mounted the halo glusterfs volume in debug mode, and the output is as follows:
.
.
.
[2018-02-05 11:42:48.282473] D [rpc-clnt-ping.c:211:rpc_clnt_ping_cbk] 0-test-halo-client-1: Ping latency is 0ms
[2018-02-05 11:42:48.282502] D [MSGID: 0] [afr-common.c:5025:afr_get_halo_latency] 0-test-halo-replicate-0: Using halo latency 10
[2018-02-05 11:42:48.282525] D [MSGID: 0] [afr-common.c:4820:__afr_handle_ping_event] 0-test-halo-client-1: Client ping @ 140032933708544 ms
.
.
.
[2018-02-05 11:42:48.393776] D [MSGID: 0] [afr-common.c:4803:find_worst_up_child] 0-test-halo-replicate-0: Found worst up child (1) @ 140032933708544 ms latency
[2018-02-05 11:42:48.393803] D [MSGID: 0] [afr-common.c:4903:__afr_handle_child_up_event] 0-test-halo-replicate-0: Marking child 1 down, doesn't meet halo threshold (10), and > halo_min_replicas (2)
.
.
.I think these debug output means:As the ping time for test-halo-client-1 (brick2) is (0.5ms) and it is not under halo threshold (10 ms), this false decision for selecting bricks happen to halo.
#gluster vol set test-halo cluster.halo-max-latency 0
volume set: failed: '0' in 'option halo-max-latency 0' is out of range [1 - 99999]On Sun, Feb 4, 2018 at 2:27 PM, atris adam <atris.adam@xxxxxxxxx> wrote:I have 2 data centers in two different region, each DC have 3 severs, I have created glusterfs volume with 4 replica, this is glusterfs volume info output:Volume Name: test-haloType: ReplicateStatus: StartedSnapshot Count: 0Number of Bricks: 1 x 4 = 4Transport-type: tcpBricks:Brick1: 10.0.0.1:/mnt/test1Brick2: 10.0.0.3:/mnt/test2Brick3: 10.0.0.5:/mnt/test3Brick4: 10.0.0.6:/mnt/test4Options Reconfigured:cluster.halo-shd-max-latency: 5cluster.halo-max-latency: 10cluster.quorum-count: 2cluster.quorum-type: fixedcluster.halo-enabled: yestransport.address-family: inetnfs.disable: onbricks with ip 10.0.0.1 & 10.0.0.3 are in region A and bricks with ip 10.0.0.5 & 10.0.0.6 are in region Bwhen I mount the volume in region A, I except the data first store in brick1 & brick2, then asynchronously the data copies in region B, on brick3 & brick4.Am I write? this is what halo claims?If yes, unfortunately, this not happen to me, no differ I mount the volume in region A or mount the volume in region B, all the data are copied in brick3 & brick4 and no data copies in brick1 & brick2.ping bricks ip from region A is as follows:ping 10.0.0.1 & 10.0.0.3 are bellow time=0.500 msping 10.0.0.5 & 10.0.0.6 are more than time=20 msWhat is the logic that the halo select the bricks to write to?if it is the access time, so when I mount the volume in region A, the ping time to brick1 & brick2 is bellow 0.5 ms, but the halo select the brick3 & brick4!!!!glusterfs version is:glusterfs 3.12.4I really need to work with halo feature, But I am not successful to run this case, Can anyone help me soon??Thx alot
_______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://lists.gluster.org/mailman/listinfo/gluster-users