You can't run Ceph OSD without a journal. The journal is always there. If you don't have a journal partition then there's a "journal" file on the OSD filesystem that does the same thing. If it's a partition then this file turns into a symlink. You will always be better off with a journal on a separate partition because of the way writeback cache in linux works (someone correct me if I'm wrong). The journal needs to flush to disk quite often, and linux is not always able to flush only the journal data. You can't defer metadata flushing forever and also doing fsync() makes all the dirty data flush as well. ext2/3/4 also flushes data to the filesystem periodicaly (5s is it I think?) which will make the latency of the journal go through the roof momentarily. (I'll leave researching how exactly XFS does it to those who care about that "filesystem'o'thing"). P.S. I feel very strongly that this whole concept is broken fundamentaly. We already have a journal for the filesystem which is time proven, well behaved and above all fast. Instead there's this reinvented wheel which supposedly does it better in userspace while not really avoiding the filesystem journal either. It would maybe make sense if OSD was storing the data on a block device directly, avoiding the filesystem altogether. But it would still do the same bloody thing and (no disrespect) ext4 does this better than Ceph ever will.
|
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com