It is not only for consistent between memory and disk. The key point is to implement the atomicity of an trancation. That is when an trancation needs to write an object and update the pglog at the same time, we must make sure the two IO do both or nether. With the journal, when osd restore from failure, the reply process can redo the transcation. I think that is why the journal can not be disabled. On 11 January 2014 13:24, Haomai Wang <haomaiwang@xxxxxxxxx> wrote: > On Fri, Jan 10, 2014 at 11:13 AM, Gregory Farnum <greg@xxxxxxxxxxx> wrote: >> Exactly. We can't do a safe update without a journal — what if power >> goes out while the write is happening? When we boot back up, we don't >> know what version the object is actually at. So if you're using btrfs, >> you can run without a journal already (and depend on snapshots for >> recovering after failures); if you are using xfs or ext4 a journal is >> required for any safety at all, even when it's fronted by a cache >> pool. > > I'm not fully agree with it. Why we can't call "fdatasync()" during > each transaction to > ensure consistent if exists cache in the front of. > >> >> On Thu, Jan 9, 2014 at 7:08 PM, Dong Yuan <yuandong1222@xxxxxxxxx> wrote: >>> The Journal is the part of implementation of ObjectStore Transaction >>> Interface, while transaction is used by PG to write pglog with object >>> data in one transaction. >>> So I think if the FileJournal could be disabled, there must be >>> something else to implement the Transaction Interface. But it seems >>> hard while no local file-system provide such function in my opinion. >>> >>> >>> On 10 January 2014 10:04, Haomai Wang <haomaiwang@xxxxxxxxx> wrote: >>>> On Fri, Jan 10, 2014 at 1:28 AM, Gregory Farnum <greg@xxxxxxxxxxx> wrote: >>>>> >>>>> The FileJournal is also for data safety whenever we're using write >>>>> ahead. To disable it we need a backing store that we know can provide >>>>> us consistent checkpoints (i.e., we can use parallel journaling mode — >>>>> so for the FileJournal, we're using btrfs, or maybe zfs someday). But >>>>> for those systems you can already configure the system not to use a >>>>> journal. >>>> >>>> Yes, it depends on backend. For example, FileStore can write a object with sync >>>> to sure consistent. If adding a disable FileJournal option, we need >>>> some works on >>>> FileStore to implement it. >>>> >>>>> -Greg >>>>> Software Engineer #42 @ http://inktank.com | http://ceph.com >>>>> >>>>> >>>>> On Thu, Jan 9, 2014 at 12:13 AM, Haomai Wang <haomaiwang@xxxxxxxxx> wrote: >>>>> > Hi all, >>>>> > >>>>> > We know FileJournal plays a important role in FileStore backend, it can >>>>> > hugely reduce write latency and improve small write operations. >>>>> > >>>>> > But in practice, there exists exceptions such as we already use FlashCache or cachepool(although it's not ready). >>>>> > >>>>> > If cachepool enabled, we may use use journal in cache_pool but may >>>>> > not like to use journal in base_pool. The main reason why drop journal >>>>> > in base_pool is that journal take over a single physical device and waste >>>>> > too much in base_pool. >>>>> > >>>>> > Like above, if I enable FlashCache or other cache, I'd not like to enable >>>>> > journal in OSD layer. >>>>> > >>>>> > So is it necessary to disable journal in special(not really special) case? >>>>> > >>>>> > Best regards, >>>>> > Wheats >>>>> > >>>>> > >>>>> > >>>>> > -- >>>>> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>> > the body of a message to majordomo@xxxxxxxxxxxxxxx >>>>> > More majordomo info at http://vger.kernel.org/majordomo-info.html >>>> >>>> >>>> >>>> >>>> -- >>>> >>>> Best Regards, >>>> >>>> Wheat >>>> -- >>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>> the body of a message to majordomo@xxxxxxxxxxxxxxx >>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>> >>> >>> >>> -- >>> Dong Yuan >>> Email:yuandong1222@xxxxxxxxx > > > > -- > Best Regards, > > Wheat -- Dong Yuan Email:yuandong1222@xxxxxxxxx -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html