Re: rbd-mirror replay is very slow - but initial bootstrap is fast

Jason Dillaman <jdillama@xxxxxxxxxx> · Tue, 10 Mar 2020 10:43:08 -0400

On Tue, Mar 10, 2020 at 10:36 AM Ml Ml <mliebherr99@xxxxxxxxxxxxxx> wrote:
>
> Hello Jason,
>
> thanks for that fast reply.
>
> This is now my /etc/ceph/ceph.conf
>
> [client]
> rbd_mirror_journal_max_fetch_bytes = 4194304
>
>
> I stopped and started my rbd-mirror manually with:
> rbd-mirror -d -c /etc/ceph/ceph.conf
>
> Still same result. Slow speed shown by iftop and entries_behind_master
> keeps increasing a lot if i produce 20MB/sec traffic on that
> replication image.
>
> The latency is like:
>  --- 10.10.50.1 ping statistics ---
> 100 packets transmitted, 100 received, 0% packet loss, time 20199ms
> rtt min/avg/max/mdev = 0.067/0.286/1.418/0.215 ms
>
> iperf from the source node to the destination node (where the
> rbd-mirror runs): 8.92 Gbits/sec
>
> Any other idea?

Do you know the average IO sizes against the primary image? Can you
create a similar image in the secondary cluster and run "fio" or "rbd
bench-write" against it using similar settings to verify that your
secondary cluster can handle the IO load? The initial image sync
portion will be issuing large, whole-object writes whereas the journal
replay will replay the writes exactly as written in the journal.

> Thanks,
> Michael
>
>
>
> On Tue, Mar 10, 2020 at 2:19 PM Jason Dillaman <jdillama@xxxxxxxxxx> wrote:
> >
> > On Tue, Mar 10, 2020 at 6:47 AM Ml Ml <mliebherr99@xxxxxxxxxxxxxx> wrote:
> > >
> > > Hello List,
> > >
> > > when i initially enable journal/mirror on an image it gets
> > > bootstrapped to my site-b pretty quickly with 250MB/sec which is about
> > > the IO Write limit.
> > >
> > > Once its up2date, the replay is very slow. About 15KB/sec and the
> > > entries_behind_maste is just running away:
> > >
> > > root@ceph01:~# rbd --cluster backup mirror pool status rbd-cluster6 --verbose
> > > health: OK
> > > images: 3 total
> > >     3 replaying
> > >
> > > ...
> > >
> > > vm-112-disk-0:
> > >   global_id:   60a795c3-9f5d-4be3-b9bd-3df971e531fa
> > >   state:       up+replaying
> > >   description: replaying, master_position=[object_number=623,
> > > tag_tid=3, entry_tid=345567], mirror_position=[object_number=35,
> > > tag_tid=3, entry_tid=18371], entries_behind_master=327196
> > >   last_update: 2020-03-10 11:36:44
> > >
> > > ...
> > >
> > > Write traffic on the source is about 20/25MB/sec.
> > >
> > > On the Source i run 14.2.6 and on the destination 12.2.13.
> > >
> > > Any idea why the replaying is sooo slow?
> >
> > What is the latency between the two clusters?
> >
> > I would recommend increasing the "rbd_mirror_journal_max_fetch_bytes"
> > config setting (defaults to 32KiB) on your destination cluster. i.e.
> > try adding add "rbd_mirror_journal_max_fetch_bytes = 4194304" to the
> > "[client]" section of your Ceph configuration file on the node where
> > "rbd-mirror" daemon is running, and restart it. It defaults to a very
> > small read size from the remote cluster in a primitive attempt to
> > reduce the potential memory usage of the rbd-mirror daemon, but it has
> > the side-effect of slowing down mirroring for links with higher
> > latencies.
> >
> > >
> > > Thanks,
> > > Michael
> > > _______________________________________________
> > > ceph-users mailing list -- ceph-users@xxxxxxx
> > > To unsubscribe send an email to ceph-users-leave@xxxxxxx
> > >
> >
> >
> > --
> > Jason
> >
>

-- 
Jason
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx