Re: When ceph synchronizes journal to disk?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 03/05/2013 05:33 AM, Xing Lin wrote:
Hi Gregory,

Thanks for your reply.

On 03/04/2013 09:55 AM, Gregory Farnum wrote:
The "journal [min|max] sync interval" values specify how frequently
the OSD's "FileStore" sends a sync to the disk. However, data is still
written into the normal filesystem as it comes in, and the normal
filesystem continues to schedule normal dirty data writeouts. This is
good — it means that when we do send a sync down you don't need to
wait for all (30 seconds * 100MB/s) 3GB or whatever of data to go to
disk before it's completed.

I do not think I understand this well. When the writeahead journal mode
is in use, would you please explain what happens to a single 4M write
request? I assume that an entry in the journal will be created for this
write request and after this entry is flushed to the journal disk, Ceph
returns successful. There should be no IO to the osd's disk. All IOs are
supposed to go to the journal disk. At a later time, Ceph will start to
apply these changes to the normal filesystem by reading from the first
entry at which its previous synchronization stops. Finally, it will read
this entry and apply this write change to the normal file system. Could
you please point out where is wrong in my understanding? Thanks,


All the data goes to the disk in write-back mode so it isn't safe yet until the flush is called. That's why it goes into the journal first, to be consistent at all times.

If you would buffer everything in the journal and flush that at once you would overload the disk for that time.

Let's say you have 300MB in the journal after 10 seconds and you want to flush that at once. That would mean that specific disk is unable to do any other operations then writing with 60MB/sec for 5 seconds.

It's better to always write in write-back mode to the disk and flush at a certain point.

In the meantime the scheduler can do it's job to balance between the reads and the writes.

Wido

>I am running 0.48.2. The related configuration is as follows.
If you're starting up a new cluster I recommend upgrading to the
bobtail series (.56.3) instead of using Argonaut — it's got a number
of enhancements you'll appreciate!

Yeah, I would like to use bobtail series. However, I started to make
small changes with Argonaut (0.48) and had ported my changes once to
0.48.2 when it was released. I think I am good to continue with it for
the moment. I may consider to port my changes to bobtail series at a
later time. Thanks,

Xing
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
Wido den Hollander
42on B.V.

Phone: +31 (0)20 700 9902
Skype: contact42on
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux