Re: rbd-mirror replay is very slow - but initial bootstrap is fast

"Anthony D'Atri" <anthony.datri@xxxxxxxxx> · Tue, 10 Mar 2020 11:30:51 -0700

FWIW when using rbd-mirror to migrate volumes between SATA SSD clusters, I found that 

   rbd_mirror_journal_max_fetch_bytes:
    section: "client"
    value: "33554432"

  rbd_journal_max_payload_bytes:
    section: "client"
    value: “8388608"

Made a world of difference in expediting journal reply on Luminous 12.2.2.  With defaults, some active voumes would take hours to converge, and a couple were falling even more behind.

This was mirroring 1 to 2 volumes at a time.  YMMV.

> On Mar 10, 2020, at 7:36 AM, Ml Ml <mliebherr99@xxxxxxxxxxxxxx> wrote:
> 
> Hello Jason,
> 
> thanks for that fast reply.
> 
> This is now my /etc/ceph/ceph.conf
> 
> [client]
> rbd_mirror_journal_max_fetch_bytes = 4194304
> 
> 
> I stopped and started my rbd-mirror manually with:
> rbd-mirror -d -c /etc/ceph/ceph.conf
> 
> Still same result. Slow speed shown by iftop and entries_behind_master
> keeps increasing a lot if i produce 20MB/sec traffic on that
> replication image.
> 
> The latency is like:
> --- 10.10.50.1 ping statistics ---
> 100 packets transmitted, 100 received, 0% packet loss, time 20199ms
> rtt min/avg/max/mdev = 0.067/0.286/1.418/0.215 ms
> 
> iperf from the source node to the destination node (where the
> rbd-mirror runs): 8.92 Gbits/sec
> 
> Any other idea?
> 
> Thanks,
> Michael
> 
> 
> 
> On Tue, Mar 10, 2020 at 2:19 PM Jason Dillaman <jdillama@xxxxxxxxxx> wrote:
>> 
>> On Tue, Mar 10, 2020 at 6:47 AM Ml Ml <mliebherr99@xxxxxxxxxxxxxx> wrote:
>>> 
>>> Hello List,
>>> 
>>> when i initially enable journal/mirror on an image it gets
>>> bootstrapped to my site-b pretty quickly with 250MB/sec which is about
>>> the IO Write limit.
>>> 
>>> Once its up2date, the replay is very slow. About 15KB/sec and the
>>> entries_behind_maste is just running away:
>>> 
>>> root@ceph01:~# rbd --cluster backup mirror pool status rbd-cluster6 --verbose
>>> health: OK
>>> images: 3 total
>>>    3 replaying
>>> 
>>> ...
>>> 
>>> vm-112-disk-0:
>>>  global_id:   60a795c3-9f5d-4be3-b9bd-3df971e531fa
>>>  state:       up+replaying
>>>  description: replaying, master_position=[object_number=623,
>>> tag_tid=3, entry_tid=345567], mirror_position=[object_number=35,
>>> tag_tid=3, entry_tid=18371], entries_behind_master=327196
>>>  last_update: 2020-03-10 11:36:44
>>> 
>>> ...
>>> 
>>> Write traffic on the source is about 20/25MB/sec.
>>> 
>>> On the Source i run 14.2.6 and on the destination 12.2.13.
>>> 
>>> Any idea why the replaying is sooo slow?
>> 
>> What is the latency between the two clusters?
>> 
>> I would recommend increasing the "rbd_mirror_journal_max_fetch_bytes"
>> config setting (defaults to 32KiB) on your destination cluster. i.e.
>> try adding add "rbd_mirror_journal_max_fetch_bytes = 4194304" to the
>> "[client]" section of your Ceph configuration file on the node where
>> "rbd-mirror" daemon is running, and restart it. It defaults to a very
>> small read size from the remote cluster in a primitive attempt to
>> reduce the potential memory usage of the rbd-mirror daemon, but it has
>> the side-effect of slowing down mirroring for links with higher
>> latencies.
>> 
>>> 
>>> Thanks,
>>> Michael
>>> _______________________________________________
>>> ceph-users mailing list -- ceph-users@xxxxxxx
>>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>>> 
>> 
>> 
>> --
>> Jason
>> 
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx