Hi,
ceph.conf file attached. It's a little ugly because I've been playing
with various parameters. You'll probably want to enable debug newstore
= 30 if you plan to do any debugging. Also, the code has been changing
quickly so performance may have changed if you haven't tested within the
last week.
Mark
On 04/28/2015 09:59 PM, kernel neophyte wrote:
Hi Mark,
I am trying to measure 4k RW performance on Newstore, and I am not
anywhere close to the numbers you are getting!
Could you share your ceph.conf for these test ?
-Neo
On Tue, Apr 28, 2015 at 5:07 PM, Mark Nelson <mnelson@xxxxxxxxxx> wrote:
Nothing official, though roughly from memory:
~1.7GB/s and something crazy like 100K IOPS for the SSD.
~150MB/s and ~125-150 IOPS for the spinning disk.
Mark
On 04/28/2015 07:00 PM, Venkateswara Rao Jujjuri wrote:
Thanks for sharing; newstore numbers look lot better;
Wondering if we have any base line numbers to put things into perspective.
like what is it on XFS or on librados?
JV
On Tue, Apr 28, 2015 at 4:25 PM, Mark Nelson <mnelson@xxxxxxxxxx> wrote:
Hi Guys,
Sage has been furiously working away at fixing bugs in newstore and
improving performance. Specifically we've been focused on write
performance
as newstore was lagging filestore but quite a bit previously. A lot of
work
has gone into implementing libaio behind the scenes and as a result
performance on spinning disks with SSD WAL (and SSD backed rocksdb) has
improved pretty dramatically. It's now often beating filestore:
http://nhm.ceph.com/newstore/newstore-5d96fe6-no_overlay.pdf
On the other hand, sequential writes are slower than random writes when
the
OSD, DB, and WAL are all on the same device be it a spinning disk or SSD.
In this situation newstore does better with random writes and sometimes
beats filestore (such as in the everything-on-spinning disk tests, and
when
IO sizes are small in the everything-on-ssd tests).
Newstore is changing daily so keep in mind that these results are almost
assuredly going to change. An interesting area of investigation will be
why
sequential writes are slower than random writes, and whether or not we
are
being limited by rocksdb ingest speed and how.
I've also uploaded a quick perf call-graph I grabbed during the "all-SSD"
32KB sequential write test to see if rocksdb was starving one of the
cores,
but found something that looks quite a bit different:
http://nhm.ceph.com/newstore/newstore-5d96fe6-no_overlay.pdf
Mark
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
[global]
osd pool default size = 1
osd crush chooseleaf type = 0
enable experimental unrecoverable data corrupting features = newstore rocksdb
osd objectstore = newstore
# newstore aio max queue depth = 4096
# newstore overlay max length = 8388608
# rocksdb wal dir = "/wal"
# newstore db path = "/wal"
newstore overlay max = 0
newstore_wal_threads = 8
rocksdb_write_buffer_size = 536870912
rocksdb_write_buffer_num = 4
rocksdb_min_write_buffer_number_to_merge = 2
rocksdb_log = /home/nhm/tmp/cbt/ceph/log/rocksdb.log
rocksdb_max_background_compactions = 4
rocksdb_compaction_threads = 4
rocksdb_level0_file_num_compaction_trigger = 4
rocksdb_max_bytes_for_level_base = 104857600 //100MB
rocksdb_target_file_size_base = 10485760 //10MB
rocksdb_num_levels = 3
rocksdb_compression = none
keyring = /home/nhm/tmp/cbt/ceph/keyring
osd pg bits = 8
osd pgp bits = 8
auth supported = none
log to syslog = false
log file = /home/nhm/tmp/cbt/ceph/log/$name.log
filestore xattr use omap = true
auth cluster required = none
auth service required = none
auth client required = none
public network = 192.168.10.0/24
cluster network = 192.168.10.0/24
rbd cache = true
osd scrub load threshold = 0.01
osd scrub min interval = 137438953472
osd scrub max interval = 137438953472
osd deep scrub interval = 137438953472
osd max scrubs = 16
filestore merge threshold = 40
filestore split multiple = 8
osd op threads = 8
debug newstore = "0/0"
debug_lockdep = "0/0"
debug_context = "0/0"
debug_crush = "0/0"
debug_mds = "0/0"
debug_mds_balancer = "0/0"
debug_mds_locker = "0/0"
debug_mds_log = "0/0"
debug_mds_log_expire = "0/0"
debug_mds_migrator = "0/0"
debug_buffer = "0/0"
debug_timer = "0/0"
debug_filer = "0/0"
debug_objecter = "0/0"
debug_rados = "0/0"
debug_rbd = "0/0"
debug_journaler = "0/0"
debug_objectcacher = "0/0"
debug_client = "0/0"
debug_osd = "0/0"
debug_optracker = "0/0"
debug_objclass = "0/0"
debug_filestore = "0/0"
debug_journal = "0/0"
debug_ms = "0/0"
debug_mon = "0/0"
debug_monc = "0/0"
debug_paxos = "0/0"
debug_tp = "0/0"
debug_auth = "0/0"
debug_finisher = "0/0"
debug_heartbeatmap = "0/0"
debug_perfcounter = "0/0"
debug_rgw = "0/0"
debug_hadoop = "0/0"
debug_asok = "0/0"
debug_throttle = "0/0"
mon pg warn max object skew = 100000
mon pg warn min per osd = 0
mon pg warn max per osd = 32768
# debug optracker = 30
# debug tp = 5
# objecter infilght op bytes = 1073741824
# objecter inflight ops = 8192
# filestore wbthrottle enable = false
# debug osd = 20
# filestore wbthrottle xfs ios start flusher = 500
# filestore wbthrottle xfs ios hard limit = 5000
# filestore wbthrottle xfs inodes start flusher = 500
# filestore wbthrottle xfs inodes hard limit = 5000
# filestore wbthrottle xfs bytes start flusher = 41943040
# filestore wbthrottle xfs bytes hard limit = 419430400
# filestore wbthrottle btrfs ios start flusher = 500
# filestore wbthrottle btrfs ios hard limit = 5000
# filestore wbthrottle btrfs inodes start flusher = 500
# filestore wbthrottle btrfs inodes hard limit = 5000
# filestore wbthrottle btrfs bytes start flusher = 41943040
# filestore wbthrottle btrfs bytes hard limit = 419430400
[mon]
mon data = /home/nhm/tmp/cbt/ceph/mon.$id
[mon.a]
host = burnupiX
mon addr = 127.0.0.1:6789
[osd.0]
host = burnupiX
osd data = /home/nhm/tmp/cbt/mnt/osd-device-0-data
osd journal = /dev/disk/by-partlabel/osd-device-0-journal
# osd journal = /dev/sds1