Le 28/01/2016 22:32, Jan Schermer a
écrit :
Hum I've seen this discussed previously but I'm not sure the fs journal could be used as a Ceph journal. First BTRFS doesn't have a journal per se, so you would not be able to use xfs or ext4 journal on another device with journal=data setup to make write bursts/random writes fast. And I won't go back to XFS or test ext4... I've detected too much silent corruption by hardware with BTRFS to trust our data to any filesystem not using CRC on reads (and in our particular case the compression and speed are additional bonuses). Second I'm not familiar with Ceph internals but OSDs must make sure that their PGs are synced so I was under the impression that the OSD content for a PG on the filesystem should always be guaranteed to be on all the other active OSDs *or* their journals (so you wouldn't apply journal content unless the other journals have already committed the same content). If you remove the journals there's no intermediate on-disk "buffer" that can be used to guarantee such a thing: one OSD will always have data that won't be guaranteed to be on disk on the others. As I understand this you could say that this is some form of 2-phase commit. I may be mistaken: there are structures in the filestore that *may* take on this role but I'm not sure what their exact use is : the <pg_num>_TEMP dirs, the omap and meta dirs. My guess is that they serve other purposes: it would make sense to use the journals for this because the data is already there and the commit/apply coherency barriers seem both trivial and efficient to use. That's not to say that the journals are the only way to maintain the needed coherency, just that they might be used to do so because once they are here, this is a trivial extension of their use. Lionel |
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com