------------------ hzwulibin 2016-09-05 ------------------------------------------------------------- 发件人:한승진 <yongiman@xxxxxxxxx> 发送日期:2016-09-05 10:31 收件人:huang jun 抄送:ceph-users 主题:Re: ceph journal system vs filesystem journal system Hi huang jun, Thanks for your reply. Still now, I am really confused... According to your reply, ceph's journal operates as a full journal(meaning metadata and object data). Is that right? Yes, it's right. I tested for jounral I/O through rados bench and trace the journal I/O from blktrace. [client node] root@ceph-mon01:~# rados -p rados-test df pool name KB objects clones degraded unfound rd rd KB wr B rados-test *528384* 129 0 258 0 0 0 129 4 total used 658876 129 total avail 156547604 total space 157206480 [osd node - blktracte output] CPU0 (8,16): Reads Queued: 0, 0KiB Writes Queued: 821, 551804KiB Read Dispatches: 0, 0KiB Write Dispatches: 588, 529428KiB Reads Requeued: 0 Writes Requeued: 0 Reads Completed: 0, 0KiB Writes Completed: 704, *529428KiB* Read Merges: 0, 0KiB Write Merges: 117, 6144KiB Read depth: 0 Write depth: 32 IO unplugs: 117 Timer unplugs: 0 The result also seems to says that the full data are written to journal. However, the ceph documentation says, *Consistency:* Ceph OSD Daemons require a filesystem interface that guarantees atomic compound operations. I think this mean ceph need a filesystem that can guarantee atomic compound operations. But may less filesystems could guarantee that. So ceph use journal, which could put 'compound operations' in one write operation to guarantee atomic. Ceph OSD Daemons *write a description of the operation to the journal* and *apply the operation to the filesystem* This is the realize of filestore: 1. write 'compound operations' int journal(just a write, atomic). 2. apply the operations to the filestore and maps. 3. trim the journal. How can I understand above document? I will really appreciate for your help. Thanks. 2016-09-01 19:09 GMT+09:00 huang jun <hjwsm1989@xxxxxxxxx>: > 2016-09-01 17:25 GMT+08:00 한승진 <yongiman@xxxxxxxxx>: > > Hi all. > > > > I'm very confused about ceph journal system > > > > Some people said ceph journal system works like linux journal filesystem. > > > > Also some people said all data are written journal first and then > written to > > OSD data. > > > > Journal of Ceph storage also write just metadata of object or write all > data > > of object? > > > > Which is right? > > > > data writen to osd first will write to osd journal through dio, and > then submit to objectstore, > that will improve the small file write performance bc the journal > write is sequential not random, > and journal can recover the data that written to journal but didn't > write to objectstore yet, like outage.. > > > Thanks for your help > > > > Best regards. > > > > > > > > > > > > > > _______________________________________________ > > ceph-users mailing list > > ceph-users@xxxxxxxxxxxxxx > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > > > -- > Thank you! > HuangJun > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com