RE: max useful journal size

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Does the sync really sync between Journal and Data disk?
>From the code it seems just call syncfs on data disk and report the synced lenth to journal.Below is my understanding about this topic:

Assume journal ahead disk with 1GB,the sync will not make the disk write the 1GB and catch up with journal, instead, it just call a syncfs to make some *finished but still in pagecache* writes go actually to disk.After doing so, this is a consistent point, it's safe to trim the space which the synced writes had taken(say 800M maybe).

>>osd journal size = {2 * (expected throughput * filestore max sync interval)}

Journal will trigger a sync(by signal the condition of sync_thread) when it exceed "haft-full", so basically, sync will happen only when:1) journal reach haft full 2) reach max_sync_interval.
The factor 2 seems to prevent 1 ) from happen, that is, try to make sync only happen when it reach max interval.

													Xiaoxi

-----Original Message-----
From: ceph-devel-owner@xxxxxxxxxxxxxxx [mailto:ceph-devel-owner@xxxxxxxxxxxxxxx] On Behalf Of Sage Weil
Sent: 2013年1月19日 12:56
To: Travis Rhoden
Cc: ceph-devel
Subject: Re: max useful journal size

On Fri, 18 Jan 2013, Travis Rhoden wrote:
> Thanks for the clarifcation, Sage. That makes sense. Especially when I 
> think about it in the sense that if I have an SSD capable of 
> 400MB/sec, and the journal doesn't flush for 5 seconds, there is 2GB 
> of data sync. The disk only does 100-150MB/sec, so this could take up 
> to twenty seconds to write out. And all the while more data is coming 
> in. Now I see the purpose of having the bigger journal.
> 
> Of course it only goes so far, and when its full, its full. I get that too.
> 
> I submitted a pull request to change the docs to specify filestore max 
> sync instead of min.

Pulled, thanks!

sage


> 
> On Fri, Jan 18, 2013 at 5:56 PM, Sage Weil <sage@xxxxxxxxxxx> wrote:
> > On Fri, 18 Jan 2013, Travis Rhoden wrote:
> >> On Fri, Jan 18, 2013 at 5:43 PM, Gregory Farnum <greg@xxxxxxxxxxx> wrote:
> >> > On Fri, Jan 18, 2013 at 2:20 PM, Travis Rhoden <trhoden@xxxxxxxxx> wrote:
> >> >> Hey folks,
> >> >>
> >> >> The Ceph docs give the following recommendation on sizing your journal:
> >> >>
> >> >> osd journal size = {2 * (expected throughput * filestore min 
> >> >> sync interval)}
> >> >>
> >> >> The default value of min sync interval is .01.  If you use 
> >> >> throughput of a mediocre 7200RPM drive of 100MB/sec, this comes 
> >> >> to 2 MB.  That seems like the lower bound to have the journal do anything at all.
> >> >
> >> > Ah. This should refer to the max sync interval, not the min!
> >>
> >> I wondered about that.  But wasn't confident enough to ask about it.
> >> >
> >> >> My question is what is the upper bound?  There's clearly a limit 
> >> >> to how big make, such that it just becomes wasted space.  The 
> >> >> reason I want to know is that since I will be journals on SSDs, 
> >> >> with each journal being a dedicated partition, there is a 
> >> >> benefit to not making the partition bigger than it needs to be.  
> >> >> All that unpartitioned space can be used by the SSD firmware for 
> >> >> wear-leveling and other things (so long as it remains unpartitioned).
> >> >>
> >> >> Would the following calc be appopriate?
> >> >>
> >> >> Assume an SSD write speed of 400MB/sec.  Default max sync interval is 5.
> >> >>
> >> >> 2 * (400 MB/sec * 5sec) = 4 GB.
> >> >>
> >> >> So is it appropriate to assume that if I can't write to an SSD 
> >> >> faster than 400 MB/sec, and I keep the default sync interval 
> >> >> values, a journal greater than 4GB is just a waste?
> >> >>
> >> >> I had been using 10GB journals...  seems like overkill.
> >> >>
> >> >> Or put another way, if I want to use 10GB journals, I should 
> >> >> bump the max sync interval to 12.5.
> >> >
> >> > It can of course grow as large as you let it, and I would leave 
> >> > some extra room as a margin. The main consideration is that the 
> >> > journal doesn't like getting too far ahead of the filestore, and 
> >> > that's what the above calculation uses to set size.
> >>
> >> Is "max sync interval" a hard stop, though?  I mean, once 5 seconds 
> >> pass, it's going to flush/sync no matter what, right? So there is 
> >> no point in making it much bigger than what can be written to the 
> >> journal in those 5 seconds.  I feel like I must be missing 
> >> something, though, otherwise the recommendation wouldn't to make 
> >> the journal 2x that size.
> >
> > The sync itself can take time, and we *initiate* the sync at that time.
> > Hence the 2x.  When the journal fills up there is a hefty 
> > performance hit, too.  When you adjust this down, check back at some 
> > ponit and make sure you don't see JOURNAL FULL messages in your log 
> > that point to a problem (with the code or the tuning logic).
> >
> > Thanks!
> > sage
> >
> >
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" 
> in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo 
> info at  http://vger.kernel.org/majordomo-info.html
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at  http://vger.kernel.org/majordomo-info.html
?韬{.n?????%??檩??w?{.n????u朕?Ф?塄}?财??j:+v??????2??璀??摺?囤??z夸z罐?+?????w棹f



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux