Re: RBD-Mirror - Journal location

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I would actually recommend the exact opposite configuration for a
high-performance, journaled image: a small but fast SSD/NVMe-backed
pool for the journal data, and a large pool for your image data.

With the librbd in-memory, writeback cache enabled, the IO operations
will be completed as soon as they are stored in the cache. This will
help to alleviate some of the extra latency from appending the journal
event. However, if your cache is full, the writeback will be paused
until the associate journal events are safely committed to disk so
that your image can remain consistent upon failure.

There are a couple of configuration knobs that can be used to batch
journal append operations to the OSDs to help reduce the IOPS load.
These are "rbd_journal_object_flush_interval",
"rbd_journal_object_flush_bytes", and "rbd_journal_object_flush_age"
-- which control the maximum number of events to batch, number of
pending journal event bytes to batch, and maximum age in seconds to
batch respectively.  For example, if your workload consists of mostly
512 byte IO, setting "rbd_journal_object_flush_interval = 30" in your
config file would reduce (30) 512-byte(ish) journal append operations
into (1) 15K-ish journal append operation.

Also note that the forthcoming 10.2.4 release should include a
noticeable performance boost for the journal since it reduces lock
contention and will automatically batch events based upon the latency
of the OSD responses.

On Mon, Oct 10, 2016 at 9:14 PM, Christian Balzer <chibi@xxxxxxx> wrote:
>
> Hello,
>
> On Tue, 11 Oct 2016 01:07:16 +0000 Cory Hawkless wrote:
>
>> Thanks Jason, works perfectly.
>>
>> Do you know if ceph blocks the client IO until the journal has acknowledged it's write? I.E can I store my journal on slower disks or will that have a negative impact on performance?
>>
> Knowing nothing about this the little detail that it's a generic pool would
> suggest that all the usual rules and suspects apply.
>
> One assumes the RBD mirror needs to keep a crash safe state, so even if its
> writes were to be allowed to be asynchronous, how much of a backlog (and
> thus memory consumption) would be permissible?
>
> So my guess is that slow disks and journals would be a no-no.
>
> Let's see that Jason has to say.
>
> Christian
>
>> Is there perhaps a hole in the documentation here? I've not been able to find anything in the man page for RBD nor on the Ceph website?
>>
>> Regards,
>> Cory
>>
>>
>> -----Original Message-----
>> From: Jason Dillaman [mailto:jdillama@xxxxxxxxxx]
>> Sent: Tuesday, 11 October 2016 7:57 AM
>> To: Cory Hawkless <Cory@xxxxxxxxxxxxxx>
>> Cc: ceph-users@xxxxxxxxxxxxxx
>> Subject: Re:  RBD-Mirror - Journal location
>>
>> Yes, the "journal_data" objects can be stored in a separate pool from the image. The rbd CLI allows you to use the "--journal-pool" argument when creating, copying, cloning, or importing and image with journaling enabled. You can also specify the journal data pool when dynamically enabling the journaling feature using the same argument.
>> Finally, there is a Ceph config setting of "rbd journal pool = XYZ"
>> that allows you to default new journals to a specific pool.
>>
>> Jason
>>
>> On Mon, Oct 10, 2016 at 1:59 AM, Cory Hawkless <Cory@xxxxxxxxxxxxxx> wrote:
>> > I’ve enabled RBD mirroring on my test clusters and it seems to be
>> > working well, my question is ‘Can we store the RBD mirror journal on a
>> > different pool?’
>> >
>> >
>> >
>> > Currently when I do something like rados ls –p sas I see
>> >
>> >
>> >
>> >
>> >
>> > rbd_data.a67d02eb141f2.0000000000000bd1
>> >
>> > rbd_data.a67d02eb141f2.0000000000000b73
>> >
>> > rbd_data.a67d02eb141f2.000000000000036d
>> >
>> > rbd_data.a67d02eb141f2.000000000000074e
>> >
>> > journal_data.75.a67d02eb141f2.175
>> >
>> > rbd_data.a67d02eb141f2.0000000000000bb6
>> >
>> > rbd_data.a67d02eb141f2.0000000000000bae
>> >
>> > rbd_data.a67d02eb141f2.0000000000000313
>> >
>> > rbd_data.a67d02eb141f2.0000000000000bb3
>> >
>> >
>> >
>> >
>> >
>> > Depending on how far behind the remote cluster is on sync, there are
>> > more or less of the journal entries.
>> >
>> >
>> >
>> > I am worried about the overhead of storing the journal on the same set
>> > of disks as the actual RBD images.
>> >
>> > My understanding is that enabling journaling is going to double the
>> > IOPS on the disks, is that correct?
>> >
>> >
>> >
>> > Any assistance appreciated
>> >
>> >
>> >
>> > Regards,
>> >
>> > Cory
>> >
>> >
>> >
>> >
>> > _______________________________________________
>> > ceph-users mailing list
>> > ceph-users@xxxxxxxxxxxxxx
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >
>>
>>
>>
>> --
>> Jason
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
> --
> Christian Balzer        Network/Systems Engineer
> chibi@xxxxxxx           Global OnLine Japan/Rakuten Communications
> http://www.gol.com/



-- 
Jason
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux