Multisite replication lag

Płaza Tomasz <Tomasz.Plaza@xxxxxxxxxx> · Thu, 29 Aug 2019 07:50:24 +0000

Hi all,

I have a two ceph 13.2.6 clusters in multisite setup on HDD disks with ~466.0 M objects and rather low usage:  63 MiB/s rd, 1.5 MiB/s wr, 978 op/s rd, 308 op/s wr.
In each cluster there are two dedicated rgws for repliaction (setted as zone endpoints, other rgws have "rgw run sync thread = false")

Replication lag is about 15sek:
Master at Thu Aug 29 09:30:21 CEST 2019:

  metadata sync no sync (zone is master)
      data sync source: master_zone
                        syncing
                        full sync: 0/128 shards
                        incremental sync: 128/128 shards
                        data is behind on 35 shards
                        behind shards: [8,10,11,29,34,37,45,46,52,54,56,57,58,59,60,62,70,77,78,80,81,82,88,89,90,94,96,97,105,109,112,119,122,125,127]
                        oldest incremental change not applied: 2019-08-29 09:30:03.0.537216s
                        70 shards are recovering
                        recovering shards: [0,1,2,5,6,7,9,10,11,12,13,14,16,18,20,21,22,23,25,28,29,34,35,36,37,39,43,48,49,52,53,54,55,56,57,58,59,60,61,63,65,67,68,69,70,72,75,76,83,84,86,91,92,97,99,101,104,105,109,110,111,112,115,116,117,119,120,121,122,126]

Slave at Thu Aug 29 09:30:22 CEST 2019:

  metadata sync syncing
                full sync: 0/64 shards
                incremental sync: 64/64 shards
                metadata is caught up with master
      data sync source: slave_zone
                        syncing
                        full sync: 0/128 shards
                        incremental sync: 128/128 shards
                        data is behind on 24 shards
                        behind shards: [11,12,15,18,20,35,51,57,59,60,67,82,83,84,86,89,93,97,105,108,120,122,125,127]
                        oldest incremental change not applied: 2019-08-29 09:30:11.0.755569s
                        64 shards are recovering
                        recovering shards: [0,1,2,3,6,7,8,9,10,11,13,14,15,16,20,21,22,23,25,27,28,29,35,36,37,38,39,43,46,48,49,52,56,59,60,61,62,63,65,67,68,69,70,76,79,83,84,85,88,90,91,97,100,104,105,109,110,111,113,117,118,120,122,123]

Is there anything to speed-up replication? Should I enable "rgw run sync thread" on all rgws not just zone endpoints?

Best Regards, Tom

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com