Hi Folks,
We are actually in the middle of doing some bluestore testing/tuning for
the upstream jewel release as we speak. :) These are (so far) pure HDD
tests using 4 nodes with 4 spinning disks and no SSDs.
Basically on the write side it's looking fantastic and that's an area we
really wanted to improve so that's great. On the read side, we are
working on getting sequential read performance up for certain IO sizes.
We are more dependent on client-side readahead with bluestore since
there is no underlying filesystem below the OSDs helping us out. This
usually isn't a problem in practice since there should be readahead on
the VM, but when testing with fio using the RBD engine you should
probably enable client side RBD readahead:
rbd readahead disable after bytes = 0
rbd readahead max bytes = 4194304
Again, this probably only matters when directly using librbd.
The other question is using default buffered reads in bluestore, ie setting:
"bluestore default buffered read = true"
That's what we are working on testing now. I've included the ceph.conf
used for these tests and also a link for some of our recent results.
Please download it and open it up in libreoffice as google's preview
isn't showing the graphs.
Here's how the legend is setup:
Hammer-FS: Hammer + Filestore
6dba7fd-BS (No RBD RA): Master + Fixes + Bluestore
6dba7fd-BS: (4M RBD RA): Master + Fixes + Bluestore + 4M RBD Read Ahead
c1e41afb-FS: Master + Filestore + new journal throttling + Sam's tuning
https://drive.google.com/file/d/0B2gTBZrkrnpZMl9OZ18yS3NuZEU/view?usp=sharing
Mark
On 03/14/2016 11:04 AM, Kenneth Waegeman wrote:
Hi Stefan,
We are also interested in the bluestore, but did not yet look into it.
We tried keyvaluestore before and that could be enabled by setting the
osd objectstore value.
And in this ticket http://tracker.ceph.com/issues/13942 I see:
[global]
enable experimental unrecoverable data corrupting features = *
bluestore fsck on mount = true
bluestore block db size = 67108864
bluestore block wal size = 134217728
bluestore block size = 5368709120
osd objectstore = bluestore
So I guess this could work for bluestore too.
Very curious to hear what you see stability and performance wise :)
Cheers,
Kenneth
On 14/03/16 16:03, Stefan Lissmats wrote:
Hello everyone!
I think that the new bluestore sounds great and would like to try it
out in my test environment but I didn't find anything how to use it
but I finally managed to test it and it really looks promising
performancewise.
If anyone has more information or guides for bluestore please tell me
where.
I thought I would share how I managed to get a new Jewel cluster with
bluestore based osd:s to work.
What i found so far is that ceph-disk can create new bluestore osd:s
(but not ceph-deploy, plase correct me if i'm wrong) and I need to
have "enable experimental unrecoverable data corrupting features =
bluestore rocksdb" in global section in ceph.conf.
After that I can create new osd:s with ceph-disk prepare --bluestore
/dev/sdg
So i created a cluster with ceph-deploy without any osd:s and then
used ceph-disk on hosts to create the osd:s.
Pretty simple in the end but it took me a while to figure that out.
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[global]
enable experimental unrecoverable data corrupting features = bluestore rocksdb
# enable experimental unrecoverable data corrupting features = bluestore rocksdb ms-type-async
osd objectstore = bluestore
# bluestore sync wal apply = false
# bluestore overlay max = 0
# bluestore_wal_threads = 8
# rocksdb_write_buffer_size = 536870912
# rocksdb_write_buffer_num = 4
# rocksdb_min_write_buffer_number_to_merge = 2
rocksdb_log = /tmp/cbt/ceph/log/rocksdb.log
# rocksdb_max_background_compactions = 4
# rocksdb_compaction_threads = 4
# rocksdb_level0_file_num_compaction_trigger = 4
# rocksdb_max_bytes_for_level_base = 104857600 //100MB
# rocksdb_target_file_size_base = 10485760 //10MB
# rocksdb_num_levels = 3
# rocksdb_compression = none
# bluestore_min_alloc_size = 32768
# ms_type = async
rbd readahead disable after bytes = 0
rbd readahead max bytes = 4194304
bluestore default buffered read = true
osd pool default size = 1
osd crush chooseleaf type = 0
keyring = /tmp/cbt/ceph/keyring
osd pg bits = 8
osd pgp bits = 8
auth supported = none
log to syslog = false
log file = /tmp/cbt/ceph/log/$name.log
filestore xattr use omap = true
auth cluster required = none
auth service required = none
auth client required = none
public network = 10.0.10.0/24
cluster network = 10.0.10.0/24
rbd cache = true
rbd cache writethrough until flush = false
osd scrub load threshold = 0.01
osd scrub min interval = 137438953472
osd scrub max interval = 137438953472
osd deep scrub interval = 137438953472
osd max scrubs = 16
filestore merge threshold = 40
filestore split multiple = 8
osd op threads = 8
debug_bluefs = "0/0"
debug_bluestore = "0/0"
debug_bdev = "0/0"
# debug_bluefs = "20"
# debug_bluestore = "30"
# debug_bdev = "20"
debug_lockdep = "0/0"
debug_context = "0/0"
debug_crush = "0/0"
debug_mds = "0/0"
debug_mds_balancer = "0/0"
debug_mds_locker = "0/0"
debug_mds_log = "0/0"
debug_mds_log_expire = "0/0"
debug_mds_migrator = "0/0"
debug_buffer = "0/0"
debug_timer = "0/0"
debug_filer = "0/0"
debug_objecter = "0/0"
debug_rados = "0/0"
debug_rbd = "0/0"
debug_journaler = "0/0"
debug_objectcacher = "0/0"
debug_client = "0/0"
debug_osd = "0/0"
# debug_osd = "30"
debug_optracker = "0/0"
debug_objclass = "0/0"
debug_filestore = "0/0"
debug_journal = "0/0"
debug_ms = "0/0"
# debug_ms = 1
debug_mon = "0/0"
debug_monc = "0/0"
debug_paxos = "0/0"
debug_tp = "0/0"
debug_auth = "0/0"
debug_finisher = "0/0"
debug_heartbeatmap = "0/0"
debug_perfcounter = "0/0"
debug_rgw = "0/0"
debug_hadoop = "0/0"
debug_asok = "0/0"
debug_throttle = "0/0"
mon pg warn max object skew = 100000
mon pg warn min per osd = 0
mon pg warn max per osd = 32768
[client]
log_file = /var/log/ceph/ceph-rbd.log
admin_socket = /var/run/ceph/ceph-rbd.asok
[mon]
mon data = /tmp/cbt/ceph/mon.$id
[mon.a]
host = incerta01.front.sepia.ceph.com
mon addr = 10.0.10.101:6789
[osd.0]
host = incerta01.front.sepia.ceph.com
osd data = /tmp/cbt/mnt/osd-device-0-data
osd journal = /dev/disk/by-partlabel/osd-device-0-journal
[osd.1]
host = incerta01.front.sepia.ceph.com
osd data = /tmp/cbt/mnt/osd-device-1-data
osd journal = /dev/disk/by-partlabel/osd-device-1-journal
[osd.2]
host = incerta01.front.sepia.ceph.com
osd data = /tmp/cbt/mnt/osd-device-2-data
osd journal = /dev/disk/by-partlabel/osd-device-2-journal
[osd.3]
host = incerta01.front.sepia.ceph.com
osd data = /tmp/cbt/mnt/osd-device-3-data
osd journal = /dev/disk/by-partlabel/osd-device-3-journal
[osd.4]
host = incerta02.front.sepia.ceph.com
osd data = /tmp/cbt/mnt/osd-device-0-data
osd journal = /dev/disk/by-partlabel/osd-device-0-journal
[osd.5]
host = incerta02.front.sepia.ceph.com
osd data = /tmp/cbt/mnt/osd-device-1-data
osd journal = /dev/disk/by-partlabel/osd-device-1-journal
[osd.6]
host = incerta02.front.sepia.ceph.com
osd data = /tmp/cbt/mnt/osd-device-2-data
osd journal = /dev/disk/by-partlabel/osd-device-2-journal
[osd.7]
host = incerta02.front.sepia.ceph.com
osd data = /tmp/cbt/mnt/osd-device-3-data
osd journal = /dev/disk/by-partlabel/osd-device-3-journal
[osd.8]
host = incerta03.front.sepia.ceph.com
osd data = /tmp/cbt/mnt/osd-device-0-data
osd journal = /dev/disk/by-partlabel/osd-device-0-journal
[osd.9]
host = incerta03.front.sepia.ceph.com
osd data = /tmp/cbt/mnt/osd-device-1-data
osd journal = /dev/disk/by-partlabel/osd-device-1-journal
[osd.10]
host = incerta03.front.sepia.ceph.com
osd data = /tmp/cbt/mnt/osd-device-2-data
osd journal = /dev/disk/by-partlabel/osd-device-2-journal
[osd.11]
host = incerta03.front.sepia.ceph.com
osd data = /tmp/cbt/mnt/osd-device-3-data
osd journal = /dev/disk/by-partlabel/osd-device-3-journal
[osd.12]
host = incerta04.front.sepia.ceph.com
osd data = /tmp/cbt/mnt/osd-device-0-data
osd journal = /dev/disk/by-partlabel/osd-device-0-journal
[osd.13]
host = incerta04.front.sepia.ceph.com
osd data = /tmp/cbt/mnt/osd-device-1-data
osd journal = /dev/disk/by-partlabel/osd-device-1-journal
[osd.14]
host = incerta04.front.sepia.ceph.com
osd data = /tmp/cbt/mnt/osd-device-2-data
osd journal = /dev/disk/by-partlabel/osd-device-2-journal
[osd.15]
host = incerta04.front.sepia.ceph.com
osd data = /tmp/cbt/mnt/osd-device-3-data
osd journal = /dev/disk/by-partlabel/osd-device-3-journal
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com