Re: How to tune Ceph RBD mirroring parameters to speed up replication

Jason Dillaman <jdillama@xxxxxxxxxx> · Tue, 9 Apr 2019 10:49:25 -0400

On Thu, Apr 4, 2019 at 6:27 AM huxiaoyu@xxxxxxxxxxxx
<huxiaoyu@xxxxxxxxxxxx> wrote:
>
> thanks a lot, Jason.
>
> how much performance loss should i expect by enabling rbd mirroring? I really need to minimize any performance impact while using this disaster recovery feature. Will a dedicated journal on Intel Optane NVMe help? If so, how big the size should be?

The worst-case impact is effectively double the write latency and
bandwidth (since the librbd client needs to journal the IO first
before committing the actual changes to the image). I would definitely
recommend using a separate fast pool for the journal to minimum the
initial journal write latency hit. The librbd in-memory cache in
writeback mode can also help since it can help absorb the additional
latency since the write IO can be (effectively) immediately ACKed if
you have enough space in the cache.

> cheers,
>
> Samuel
>
> ________________________________
> huxiaoyu@xxxxxxxxxxxx
>
>
> From: Jason Dillaman
> Date: 2019-04-03 23:03
> To: huxiaoyu@xxxxxxxxxxxx
> CC: ceph-users
> Subject: Re:  How to tune Ceph RBD mirroring parameters to speed up replication
> For better or worse, out of the box, librbd and rbd-mirror are
> configured to conserve memory at the expense of performance to support
> the potential case of thousands of images being mirrored and only a
> single "rbd-mirror" daemon attempting to handle the load.
>
> You can optimize writes by adding "rbd_journal_max_payload_bytes =
> 8388608" to the "[client]" section on the librbd client nodes.
> Normally, writes larger than 16KiB are broken into multiple journal
> entries to allow the remote "rbd-mirror" daemon to make forward
> progress w/o using too much memory, so this will ensure large IOs only
> require a single journal entry.
>
> You can also add "rbd_mirror_journal_max_fetch_bytes = 33554432" to
> the "[client]" section on the "rbd-mirror" daemon nodes and restart
> the daemon for the change to take effect. Normally, the daemon tries
> to nibble the per-image journal events to prevent excessive memory use
> in the case where potentially thousands of images are being mirrored.
>
> On Wed, Apr 3, 2019 at 4:34 PM huxiaoyu@xxxxxxxxxxxx
> <huxiaoyu@xxxxxxxxxxxx> wrote:
> >
> > Hello, folks,
> >
> > I am setting up two ceph clusters to test async replication via RBD mirroring. The two clusters are very close, just in two buildings about 20m away, and the networking is very good as well, 10Gb Fiber connection. In this case, how should i tune the relevant RBD mirroring parameters to accelerate the replication?
> >
> > thanks in advance,
> >
> > Samuel
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users@xxxxxxxxxxxxxx
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
> --
> Jason
>

-- 
Jason
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com