Hi Sage and Mark, Chendi from our team had done the test based on v0.91. The setup is 4 nodes, totally 40HDDs with SSDs as journal, replica=2 Mount a partition from journal SSD to /current/omap benefit 4K random write IOPS(peak) from 1524 to 2694, that's 76% while other IO patterns keep the same. Some details are here. If this can reproduce in other setup, I suspect it worth us to investigate some time to do the detection. Runid OP_SIZE OP_TYPE QD Engine server_num client_num rbd_num RBD_FIO_IOPS RBD_FIO_BW RBD_FIO_Latency osd_read_iops osd_write_iops osd_read_bw Prev 305 4k randwrite qd8 vdb 4 2 40 1524 6170.1 209.3851 7.862196 7677.648 0.446566 54.916435 Omap2ssd 320 4k randwrite qd8 vdb 4 2 40 2694 10864.23 119.4587 322.4334 10930 1.409266 71.33833 Xiaoxi -----Original Message----- From: Mark Nelson [mailto:mnelson@xxxxxxxxxx] Sent: Wednesday, April 22, 2015 7:59 AM To: Sage Weil; Chen, Xiaoxi Cc: Haomai Wang; Somnath Roy; Duan, Jiangang; Zhang, Jian; ceph-devel Subject: Re: 回复: Re: 回复: Re: 回复: Re: NewStore performance analysis On 04/21/2015 06:57 PM, Sage Weil wrote: > On Tue, 21 Apr 2015, Chen, Xiaoxi wrote: >> ---- Sage Weil?? ---- >> >>> On Tue, 21 Apr 2015, Chen, Xiaoxi wrote: >>>> Haomai is right in theory, but I am not sure whether all >>>> user(mon,filestore,kvstore) of submit_transaction API clearly >>>> holding the expectation that their data is not persistent and may >>>> lost in failure. So in rocksdb now the sync is default to true >>>> even in submit_transaction(and this option make the two api exactly the same). >>>> Maybe we need to rename the api to >>>> submit_transaction_persistent/nonpersistent to better discribe the >>>> behavior? >>> >>> Let's audit them, then.. I think they are right, but we may as well >>> confirm! >>> >>> Again, FileStore is the odd one out here because it is relying on >>> the >>> syncfs(2) at commit time for everything. >>> >> >> Yes, so maybe we dont need to expose the option to user, we can >> decide whether to.sync in code logic. > > Yeah, I think it'll reduce confusion too. I suggest we do a pull > request against master that does this... let me know if you want to do > it, otherwise I will! > >> I remember some folks in out team tried to move KVDB to a partition >> on SSD while leave other filestore data on HDD, in my memory it >> benifit performance. This deployment is problematic with >> kv_sync=false. gWill check the data first and then we can evaluate >> whethe we want to support this kind of deployment. > > We could detect this by doing a stat(2) on the current/omap/ vs > current/ dirs and checking if it's a different file system. If so, we > can do the > syncfs(2) on both dirs. The btrfs case would probably not be > practical, but we can error out in that case. But yeah not sure how > important it would be to support this since filestore doesn't use > leveldb that heavily... and I'd prefer to limit our investment of time > there if we can instead make newstore (or something else) better. FWIW, the last time I tried putting leveldb on SSD didn't really help at all. It's been a while so maybe that's changed, but newstore definitely seems like the way forward to me. Mark ��.n��������+%������w��{.n����z��u���ܨ}���Ơz�j:+v�����w����ޙ��&�)ߡ�a����z�ޗ���ݢj��w�f