Re: rbd-mirror replay is very slow - but initial bootstrap is fast

Ml Ml <mliebherr99@xxxxxxxxxxxxxx> · Tue, 10 Mar 2020 15:36:30 +0100

Hello Jason,

thanks for that fast reply.

This is now my /etc/ceph/ceph.conf

[client]
rbd_mirror_journal_max_fetch_bytes = 4194304

I stopped and started my rbd-mirror manually with:
rbd-mirror -d -c /etc/ceph/ceph.conf

Still same result. Slow speed shown by iftop and entries_behind_master
keeps increasing a lot if i produce 20MB/sec traffic on that
replication image.

The latency is like:
 --- 10.10.50.1 ping statistics ---
100 packets transmitted, 100 received, 0% packet loss, time 20199ms
rtt min/avg/max/mdev = 0.067/0.286/1.418/0.215 ms

iperf from the source node to the destination node (where the
rbd-mirror runs): 8.92 Gbits/sec

Any other idea?

Thanks,
Michael

On Tue, Mar 10, 2020 at 2:19 PM Jason Dillaman <jdillama@xxxxxxxxxx> wrote:
>
> On Tue, Mar 10, 2020 at 6:47 AM Ml Ml <mliebherr99@xxxxxxxxxxxxxx> wrote:
> >
> > Hello List,
> >
> > when i initially enable journal/mirror on an image it gets
> > bootstrapped to my site-b pretty quickly with 250MB/sec which is about
> > the IO Write limit.
> >
> > Once its up2date, the replay is very slow. About 15KB/sec and the
> > entries_behind_maste is just running away:
> >
> > root@ceph01:~# rbd --cluster backup mirror pool status rbd-cluster6 --verbose
> > health: OK
> > images: 3 total
> >     3 replaying
> >
> > ...
> >
> > vm-112-disk-0:
> >   global_id:   60a795c3-9f5d-4be3-b9bd-3df971e531fa
> >   state:       up+replaying
> >   description: replaying, master_position=[object_number=623,
> > tag_tid=3, entry_tid=345567], mirror_position=[object_number=35,
> > tag_tid=3, entry_tid=18371], entries_behind_master=327196
> >   last_update: 2020-03-10 11:36:44
> >
> > ...
> >
> > Write traffic on the source is about 20/25MB/sec.
> >
> > On the Source i run 14.2.6 and on the destination 12.2.13.
> >
> > Any idea why the replaying is sooo slow?
>
> What is the latency between the two clusters?
>
> I would recommend increasing the "rbd_mirror_journal_max_fetch_bytes"
> config setting (defaults to 32KiB) on your destination cluster. i.e.
> try adding add "rbd_mirror_journal_max_fetch_bytes = 4194304" to the
> "[client]" section of your Ceph configuration file on the node where
> "rbd-mirror" daemon is running, and restart it. It defaults to a very
> small read size from the remote cluster in a primitive attempt to
> reduce the potential memory usage of the rbd-mirror daemon, but it has
> the side-effect of slowing down mirroring for links with higher
> latencies.
>
> >
> > Thanks,
> > Michael
> > _______________________________________________
> > ceph-users mailing list -- ceph-users@xxxxxxx
> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
> >
>
>
> --
> Jason
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx