Hi mark, Really thanks for the data. Not sure if this PR will be merged soon (https://github.com/ceph/ceph/pull/4266); Some known bugs around : `rados ls` will cause assert fault (which was fix by the PR) `rbd list` will also cause assert failure (because omap_iter hasn’t implemented yet). PG will not back to active+clean after restart OSD. (wip) The most performance related part seems about the newstore_fsync_threads and rocksdb tuning The small rados write performance degradation should related with fsync sice it's in creation mode that will not go through rocksdb. The RBD random write case should related with rocksdb since the compaction overhead(and also the WAL log there). Xiaoxi -----Original Message----- From: ceph-devel-owner@xxxxxxxxxxxxxxx [mailto:ceph-devel-owner@xxxxxxxxxxxxxxx] On Behalf Of Mark Nelson Sent: Wednesday, April 8, 2015 9:53 AM To: Somnath Roy; ceph-devel Subject: Re: Initial newstore vs filestore results Hi Somnath, Sure. It's very easy: 1) install or build wip-newstore 2) Add the following to your ceph.conf file: enable experimental unrecoverable data corrupting features = newstore rocksdb osd objectstore = newstore Lots of interesting things to dig into! Mark On 04/07/2015 08:48 PM, Somnath Roy wrote: > Mark, > Could you please send the instruction out on how to use this new store? > > Thanks & Regards > Somnath > > -----Original Message----- > From: ceph-devel-owner@xxxxxxxxxxxxxxx > [mailto:ceph-devel-owner@xxxxxxxxxxxxxxx] On Behalf Of Mark Nelson > Sent: Tuesday, April 07, 2015 6:46 PM > To: ceph-devel > Subject: Re: Initial newstore vs filestore results > > > > On 04/07/2015 02:16 PM, Mark Nelson wrote: >> On 04/07/2015 09:57 AM, Mark Nelson wrote: >>> Hi Guys, >>> >>> I ran some quick tests on Sage's newstore branch. So far given that >>> this is a prototype, things are looking pretty good imho. The 4MB >>> object rados bench read/write and small read performance looks >>> especially good. Keep in mind that this is not using the SSD >>> journals in any way, so 640MB/s sequential writes is actually really >>> good compared to filestore without SSD journals. >>> >>> small write performance appears to be fairly bad, especially in the >>> RBD case where it's small writes to larger objects. I'm going to >>> sit down and see if I can figure out what's going on. It's bad >>> enough that I suspect there's just something odd going on. >>> >>> Mark >> >> Seekwatcher/blktrace graphs of a 4 OSD cluster using newstore for >> those >> interested: >> >> http://nhm.ceph.com/newstore/ >> >> Interestingly small object write/read performance with 4 OSDs was >> about >> 1/3-1/4 the speed of the same cluster with 36 OSDs. >> >> Note: Thanks Dan for fixing the directory column width! >> >> Mark > > New fio/librbd results using Sage's latest code that attempts to keep small overwrite extents in the db. This is 4 OSD so not directly comparable to the 36 OSD tests above, but does include seekwatcher graphs. Results in MB/s: > > write read randw randr > 4MB 57.9 319.6 55.2 285.9 > 128KB 2.5 230.6 2.4 125.4 > 4KB 0.46 55.65 1.11 3.56 > > Seekwatcher graphs: > > http://nhm.ceph.com/newstore/20150407/ > > Mark > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" > in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo > info at http://vger.kernel.org/majordomo-info.html > > ________________________________ > > PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies). > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html ��.n��������+%������w��{.n����z��u���ܨ}���Ơz�j:+v�����w����ޙ��&�)ߡ�a����z�ޗ���ݢj��w�f