Re: When ceph synchronizes journal to disk?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tuesday, March 5, 2013 at 5:54 AM, Wido den Hollander wrote:
> On 03/05/2013 05:33 AM, Xing Lin wrote:
> > Hi Gregory,
> >  
> > Thanks for your reply.
> >  
> > On 03/04/2013 09:55 AM, Gregory Farnum wrote:
> > > The "journal [min|max] sync interval" values specify how frequently
> > > the OSD's "FileStore" sends a sync to the disk. However, data is still
> > > written into the normal filesystem as it comes in, and the normal
> > > filesystem continues to schedule normal dirty data writeouts. This is
> > > good — it means that when we do send a sync down you don't need to
> > > wait for all (30 seconds * 100MB/s) 3GB or whatever of data to go to
> > > disk before it's completed.
> >  
> >  
> >  
> > I do not think I understand this well. When the writeahead journal mode
> > is in use, would you please explain what happens to a single 4M write
> > request? I assume that an entry in the journal will be created for this
> > write request and after this entry is flushed to the journal disk, Ceph
> > returns successful. There should be no IO to the osd's disk. All IOs are
> > supposed to go to the journal disk. At a later time, Ceph will start to
> > apply these changes to the normal filesystem by reading from the first
> > entry at which its previous synchronization stops. Finally, it will read
> > this entry and apply this write change to the normal file system. Could
> > you please point out where is wrong in my understanding? Thanks,
>  
>  
>  
> All the data goes to the disk in write-back mode so it isn't safe yet  
> until the flush is called. That's why it goes into the journal first, to  
> be consistent at all times.
>  
> If you would buffer everything in the journal and flush that at once you  
> would overload the disk for that time.
>  
> Let's say you have 300MB in the journal after 10 seconds and you want to  
> flush that at once. That would mean that specific disk is unable to do  
> any other operations then writing with 60MB/sec for 5 seconds.
>  
> It's better to always write in write-back mode to the disk and flush at  
> a certain point.
>  
> In the meantime the scheduler can do it's job to balance between the  
> reads and the writes.
>  
> Wido
Yep, what Wido said. Specifically, we do force the data to the journal with an fsync or equivalent before responding to the client, but once it's stable on the journal we give it to the filesystem (without doing any sort of forced sync). This is necessary — all reads are served from the filesystem.
-Greg

Software Engineer #42 @ http://inktank.com | http://ceph.com  


--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux