Re: rbd-mirror replay is very slow - but initial bootstrap is fast

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Mar 10, 2020 at 11:53 AM Ml Ml <mliebherr99@xxxxxxxxxxxxxx> wrote:
>
> Hello Jason,
>
> okay, good hint!
>
> I did not realize, that it will write the journal 1:1 but that makes
> sense. I will benchmark it later.

Yes, it's replaying the exact IOs again to ensure it's point-in-time consistent.

> However, my backup cluster is the place where the old spinning rust
> will find its last dedication.
> Therefore it will never be as fast as the live cluster.
>
> Looking that the modes, i should change from Journal-based to
> Snapshot-based mirroring?

Well, snapshot-based mirroring hasn't been released yet (technically)
since it's new with Octopus. It might be better in such an
environment, however, since it has the potential to reduce the number
of IOs.

> Thanks,
> Michael
>
> On Tue, Mar 10, 2020 at 3:43 PM Jason Dillaman <jdillama@xxxxxxxxxx> wrote:
> >
> > On Tue, Mar 10, 2020 at 10:36 AM Ml Ml <mliebherr99@xxxxxxxxxxxxxx> wrote:
> > >
> > > Hello Jason,
> > >
> > > thanks for that fast reply.
> > >
> > > This is now my /etc/ceph/ceph.conf
> > >
> > > [client]
> > > rbd_mirror_journal_max_fetch_bytes = 4194304
> > >
> > >
> > > I stopped and started my rbd-mirror manually with:
> > > rbd-mirror -d -c /etc/ceph/ceph.conf
> > >
> > > Still same result. Slow speed shown by iftop and entries_behind_master
> > > keeps increasing a lot if i produce 20MB/sec traffic on that
> > > replication image.
> > >
> > > The latency is like:
> > >  --- 10.10.50.1 ping statistics ---
> > > 100 packets transmitted, 100 received, 0% packet loss, time 20199ms
> > > rtt min/avg/max/mdev = 0.067/0.286/1.418/0.215 ms
> > >
> > > iperf from the source node to the destination node (where the
> > > rbd-mirror runs): 8.92 Gbits/sec
> > >
> > > Any other idea?
> >
> > Do you know the average IO sizes against the primary image? Can you
> > create a similar image in the secondary cluster and run "fio" or "rbd
> > bench-write" against it using similar settings to verify that your
> > secondary cluster can handle the IO load? The initial image sync
> > portion will be issuing large, whole-object writes whereas the journal
> > replay will replay the writes exactly as written in the journal.
> >
> > > Thanks,
> > > Michael
> > >
> > >
> > >
> > > On Tue, Mar 10, 2020 at 2:19 PM Jason Dillaman <jdillama@xxxxxxxxxx> wrote:
> > > >
> > > > On Tue, Mar 10, 2020 at 6:47 AM Ml Ml <mliebherr99@xxxxxxxxxxxxxx> wrote:
> > > > >
> > > > > Hello List,
> > > > >
> > > > > when i initially enable journal/mirror on an image it gets
> > > > > bootstrapped to my site-b pretty quickly with 250MB/sec which is about
> > > > > the IO Write limit.
> > > > >
> > > > > Once its up2date, the replay is very slow. About 15KB/sec and the
> > > > > entries_behind_maste is just running away:
> > > > >
> > > > > root@ceph01:~# rbd --cluster backup mirror pool status rbd-cluster6 --verbose
> > > > > health: OK
> > > > > images: 3 total
> > > > >     3 replaying
> > > > >
> > > > > ...
> > > > >
> > > > > vm-112-disk-0:
> > > > >   global_id:   60a795c3-9f5d-4be3-b9bd-3df971e531fa
> > > > >   state:       up+replaying
> > > > >   description: replaying, master_position=[object_number=623,
> > > > > tag_tid=3, entry_tid=345567], mirror_position=[object_number=35,
> > > > > tag_tid=3, entry_tid=18371], entries_behind_master=327196
> > > > >   last_update: 2020-03-10 11:36:44
> > > > >
> > > > > ...
> > > > >
> > > > > Write traffic on the source is about 20/25MB/sec.
> > > > >
> > > > > On the Source i run 14.2.6 and on the destination 12.2.13.
> > > > >
> > > > > Any idea why the replaying is sooo slow?
> > > >
> > > > What is the latency between the two clusters?
> > > >
> > > > I would recommend increasing the "rbd_mirror_journal_max_fetch_bytes"
> > > > config setting (defaults to 32KiB) on your destination cluster. i.e.
> > > > try adding add "rbd_mirror_journal_max_fetch_bytes = 4194304" to the
> > > > "[client]" section of your Ceph configuration file on the node where
> > > > "rbd-mirror" daemon is running, and restart it. It defaults to a very
> > > > small read size from the remote cluster in a primitive attempt to
> > > > reduce the potential memory usage of the rbd-mirror daemon, but it has
> > > > the side-effect of slowing down mirroring for links with higher
> > > > latencies.
> > > >
> > > > >
> > > > > Thanks,
> > > > > Michael
> > > > > _______________________________________________
> > > > > ceph-users mailing list -- ceph-users@xxxxxxx
> > > > > To unsubscribe send an email to ceph-users-leave@xxxxxxx
> > > > >
> > > >
> > > >
> > > > --
> > > > Jason
> > > >
> > >
> >
> >
> > --
> > Jason
> >
>


-- 
Jason
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux