Re: Using bluestore in Jewel 10.0.4

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Folks,

We are actually in the middle of doing some bluestore testing/tuning for the upstream jewel release as we speak. :) These are (so far) pure HDD tests using 4 nodes with 4 spinning disks and no SSDs.

Basically on the write side it's looking fantastic and that's an area we really wanted to improve so that's great. On the read side, we are working on getting sequential read performance up for certain IO sizes. We are more dependent on client-side readahead with bluestore since there is no underlying filesystem below the OSDs helping us out. This usually isn't a problem in practice since there should be readahead on the VM, but when testing with fio using the RBD engine you should probably enable client side RBD readahead:

rbd readahead disable after bytes = 0
rbd readahead max bytes = 4194304

Again, this probably only matters when directly using librbd.

The other question is using default buffered reads in bluestore, ie setting:

"bluestore default buffered read = true"

That's what we are working on testing now. I've included the ceph.conf used for these tests and also a link for some of our recent results. Please download it and open it up in libreoffice as google's preview isn't showing the graphs.

Here's how the legend is setup:

Hammer-FS: Hammer + Filestore
6dba7fd-BS (No RBD RA): Master + Fixes + Bluestore
6dba7fd-BS: (4M RBD RA): Master + Fixes + Bluestore + 4M RBD Read Ahead
c1e41afb-FS: Master + Filestore + new journal throttling + Sam's tuning

https://drive.google.com/file/d/0B2gTBZrkrnpZMl9OZ18yS3NuZEU/view?usp=sharing

Mark

On 03/14/2016 11:04 AM, Kenneth Waegeman wrote:
Hi Stefan,

We are also interested in the bluestore, but did not yet look into it.

We tried keyvaluestore before and that could be enabled by setting the
osd objectstore value.
And in this ticket http://tracker.ceph.com/issues/13942 I see:

[global]
         enable experimental unrecoverable data corrupting features = *
         bluestore fsck on mount = true
         bluestore block db size = 67108864
         bluestore block wal size = 134217728
         bluestore block size = 5368709120
         osd objectstore = bluestore

So I guess this could work for bluestore too.

Very curious to hear what you see stability and performance wise :)

Cheers,
Kenneth

On 14/03/16 16:03, Stefan Lissmats wrote:
Hello everyone!

I think that the new bluestore sounds great and would like to try it
out in my test environment but I didn't find anything how to use it
but I finally managed to test it and it really looks promising
performancewise.
If anyone has more information or guides for bluestore please tell me
where.

I thought I would share how I managed to get a new Jewel cluster with
bluestore based osd:s to work.


What i found so far is that ceph-disk can create new bluestore osd:s
(but not ceph-deploy, plase correct me if i'm wrong) and I need to
have "enable experimental unrecoverable data corrupting features =
bluestore rocksdb" in global section in ceph.conf.
After that I can create new osd:s with ceph-disk prepare --bluestore
/dev/sdg

So i created a cluster with ceph-deploy without any osd:s and then
used ceph-disk on hosts to create the osd:s.

Pretty simple in the end but it took me a while to figure that out.


_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[global]
        enable experimental unrecoverable data corrupting features = bluestore rocksdb
#        enable experimental unrecoverable data corrupting features = bluestore rocksdb ms-type-async
        osd objectstore = bluestore
#        bluestore sync wal apply = false
#        bluestore overlay max = 0
#        bluestore_wal_threads = 8
#        rocksdb_write_buffer_size = 536870912
#        rocksdb_write_buffer_num = 4
#        rocksdb_min_write_buffer_number_to_merge = 2
        rocksdb_log = /tmp/cbt/ceph/log/rocksdb.log
#        rocksdb_max_background_compactions = 4
#        rocksdb_compaction_threads = 4
#        rocksdb_level0_file_num_compaction_trigger = 4
#        rocksdb_max_bytes_for_level_base = 104857600 //100MB
#        rocksdb_target_file_size_base = 10485760      //10MB
#        rocksdb_num_levels = 3
#        rocksdb_compression = none
#        bluestore_min_alloc_size = 32768

#        ms_type = async

        rbd readahead disable after bytes = 0
        rbd readahead max bytes = 4194304
        bluestore default buffered read = true
        osd pool default size = 1
        osd crush chooseleaf type = 0

        keyring = /tmp/cbt/ceph/keyring
        osd pg bits = 8
        osd pgp bits = 8
        auth supported = none
        log to syslog = false
        log file = /tmp/cbt/ceph/log/$name.log
        filestore xattr use omap = true
        auth cluster required = none
        auth service required = none
        auth client required = none

        public network = 10.0.10.0/24
        cluster network = 10.0.10.0/24
        rbd cache = true
        rbd cache writethrough until flush = false
        osd scrub load threshold = 0.01
        osd scrub min interval = 137438953472
        osd scrub max interval = 137438953472
        osd deep scrub interval = 137438953472
        osd max scrubs = 16

        filestore merge threshold = 40
        filestore split multiple = 8
        osd op threads = 8

        debug_bluefs = "0/0"
        debug_bluestore = "0/0"
        debug_bdev = "0/0"
#        debug_bluefs = "20"
#        debug_bluestore = "30"
#        debug_bdev = "20"
        debug_lockdep = "0/0" 
        debug_context = "0/0"
        debug_crush = "0/0"
        debug_mds = "0/0"
        debug_mds_balancer = "0/0"
        debug_mds_locker = "0/0"
        debug_mds_log = "0/0"
        debug_mds_log_expire = "0/0"
        debug_mds_migrator = "0/0"
        debug_buffer = "0/0"
        debug_timer = "0/0"
        debug_filer = "0/0"
        debug_objecter = "0/0"
        debug_rados = "0/0"
        debug_rbd = "0/0"
        debug_journaler = "0/0"
        debug_objectcacher = "0/0"
        debug_client = "0/0"
        debug_osd = "0/0"
#        debug_osd = "30"
        debug_optracker = "0/0"
        debug_objclass = "0/0"
        debug_filestore = "0/0"
        debug_journal = "0/0"
        debug_ms = "0/0"
#        debug_ms = 1
        debug_mon = "0/0"
        debug_monc = "0/0"
        debug_paxos = "0/0"
        debug_tp = "0/0"
        debug_auth = "0/0"
        debug_finisher = "0/0"
        debug_heartbeatmap = "0/0"
        debug_perfcounter = "0/0"
        debug_rgw = "0/0"
        debug_hadoop = "0/0"
        debug_asok = "0/0"
        debug_throttle = "0/0"

        mon pg warn max object skew = 100000
        mon pg warn min per osd = 0
        mon pg warn max per osd = 32768

[client]
        log_file = /var/log/ceph/ceph-rbd.log
        admin_socket = /var/run/ceph/ceph-rbd.asok

[mon]
	mon data = /tmp/cbt/ceph/mon.$id
        
[mon.a]
	host = incerta01.front.sepia.ceph.com 
        mon addr = 10.0.10.101:6789

[osd.0]
        host = incerta01.front.sepia.ceph.com
        osd data = /tmp/cbt/mnt/osd-device-0-data
        osd journal = /dev/disk/by-partlabel/osd-device-0-journal

[osd.1]
        host = incerta01.front.sepia.ceph.com
        osd data = /tmp/cbt/mnt/osd-device-1-data
        osd journal = /dev/disk/by-partlabel/osd-device-1-journal

[osd.2]
        host = incerta01.front.sepia.ceph.com
        osd data = /tmp/cbt/mnt/osd-device-2-data
        osd journal = /dev/disk/by-partlabel/osd-device-2-journal

[osd.3]
        host = incerta01.front.sepia.ceph.com
        osd data = /tmp/cbt/mnt/osd-device-3-data
        osd journal = /dev/disk/by-partlabel/osd-device-3-journal

[osd.4]
        host = incerta02.front.sepia.ceph.com
        osd data = /tmp/cbt/mnt/osd-device-0-data
        osd journal = /dev/disk/by-partlabel/osd-device-0-journal

[osd.5]
        host = incerta02.front.sepia.ceph.com
        osd data = /tmp/cbt/mnt/osd-device-1-data
        osd journal = /dev/disk/by-partlabel/osd-device-1-journal

[osd.6]
        host = incerta02.front.sepia.ceph.com
        osd data = /tmp/cbt/mnt/osd-device-2-data
        osd journal = /dev/disk/by-partlabel/osd-device-2-journal

[osd.7]
        host = incerta02.front.sepia.ceph.com
        osd data = /tmp/cbt/mnt/osd-device-3-data
        osd journal = /dev/disk/by-partlabel/osd-device-3-journal

[osd.8]
        host = incerta03.front.sepia.ceph.com
        osd data = /tmp/cbt/mnt/osd-device-0-data
        osd journal = /dev/disk/by-partlabel/osd-device-0-journal

[osd.9]
        host = incerta03.front.sepia.ceph.com
        osd data = /tmp/cbt/mnt/osd-device-1-data
        osd journal = /dev/disk/by-partlabel/osd-device-1-journal

[osd.10]
        host = incerta03.front.sepia.ceph.com
        osd data = /tmp/cbt/mnt/osd-device-2-data
        osd journal = /dev/disk/by-partlabel/osd-device-2-journal

[osd.11]
        host = incerta03.front.sepia.ceph.com
        osd data = /tmp/cbt/mnt/osd-device-3-data
        osd journal = /dev/disk/by-partlabel/osd-device-3-journal

[osd.12]
        host = incerta04.front.sepia.ceph.com
        osd data = /tmp/cbt/mnt/osd-device-0-data
        osd journal = /dev/disk/by-partlabel/osd-device-0-journal

[osd.13]
        host = incerta04.front.sepia.ceph.com
        osd data = /tmp/cbt/mnt/osd-device-1-data
        osd journal = /dev/disk/by-partlabel/osd-device-1-journal

[osd.14]
        host = incerta04.front.sepia.ceph.com
        osd data = /tmp/cbt/mnt/osd-device-2-data
        osd journal = /dev/disk/by-partlabel/osd-device-2-journal

[osd.15]
        host = incerta04.front.sepia.ceph.com
        osd data = /tmp/cbt/mnt/osd-device-3-data
        osd journal = /dev/disk/by-partlabel/osd-device-3-journal

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux