Re: RBD Block performance vs rbd mount as filesystem

Alexandre DERUMIER <aderumier@xxxxxxxxx> · Mon, 7 Nov 2016 07:49:01 +0100 (CET)

Also, if you really to get more iops from qemu,

and you can use multiple disk, with enabling iothread.
(i'm able to get 50-60k iops 4k rand read by disk, up to 450k iops with 9 disks).

In the future, in qemu, I'll be possible to use multiple iothread for 1 disk.


----- Mail original -----
De: "aderumier" <aderumier@xxxxxxxxx>
À: "Bill WONG" <wongahshuen@xxxxxxxxx>
Cc: "ceph-users" <ceph-users@xxxxxxxxxxxxxx>
Envoyé: Lundi 7 Novembre 2016 07:46:16
Objet: Re:  RBD Block performance vs rbd mount as filesystem

>>any document can provided for how i can complied ceph with jemalloc as well? as it looks if ceph with jemalloc is much better performance too. 

simply build ceph with --with-jemalloc (I'm seeing improvements on really high iops, something like 300k iops, tcmalloc is limiting, and with jemalloc I'm around 450k iops) 


here my debian package rules change: 

iff --git a/debian/control b/debian/control 
index 3e03689..ab23b3b 100644 
--- a/debian/control 
+++ b/debian/control 
@@ -38,7 +38,6 @@ Build-Depends: autoconf, 
libexpat1-dev, 
libfcgi-dev, 
libfuse-dev, 
- libgoogle-perftools-dev [i386 amd64 arm64], 
libkeyutils-dev, 
libleveldb-dev, 
libnss3-dev,

diff --git a/debian/rules b/debian/rules 
index b705dd6..7db5b9a 100755 
--- a/debian/rules 
+++ b/debian/rules 
@@ -23,7 +23,7 @@ export DEB_HOST_ARCH ?= $(shell dpkg-architecture -qDEB_HOST_ARCH) 
extraopts += --with-ocf --with-nss 
extraopts += --with-debug 
extraopts += --enable-cephfs-java 
- 
+extraopts += --with-jemalloc 
# rocksdb is not packaged by anyone. build it if we can. 
extraopts += --with-librocksdb-static=check 


>>and what's the side effect if debug ms=0/0 

I don't see any side effect. you'll don't have debug information. (but as your are in production, it shouldn't be a problem) 


>>and it looks disable cephx auth is no good for production use.... cephx affect lot of performance? 

for me, I still have 10-20% difference with cephx. 
If you only use your ceph cluster for your qemu cluster, I don't see any problem to disable it. 
(and of course your ceph cluster is firewalled / or network access is only available for your qemu client). 

Note that changing it only is not possible. so you need to shutdown all the clients before doing this change. 



----- Mail original ----- 
De: "Bill WONG" <wongahshuen@xxxxxxxxx> 
À: "aderumier" <aderumier@xxxxxxxxx> 
Cc: "dillaman" <dillaman@xxxxxxxxxx>, "ceph-users" <ceph-users@xxxxxxxxxxxxxx> 
Envoyé: Lundi 7 Novembre 2016 06:35:38 
Objet: Re:  RBD Block performance vs rbd mount as filesystem 

HI Alexandre, 
thank you! 
any document can provided for how i can complied ceph with jemalloc as well? as it looks if ceph with jemalloc is much better performance too. 
and what's the side effect if debug ms=0/0 and it looks disable cephx auth is no good for production use.... cephx affect lot of performance? 


On Sat, Nov 5, 2016 at 5:55 PM, Alexandre DERUMIER < [ mailto:aderumier@xxxxxxxxx | aderumier@xxxxxxxxx ] > wrote: 


here some tips I use to improve librbd performance && qemu: 

- disabling cephx auth 

- disable debug_ms : (I'm jumping from 30k iops to 45k iops, with 4k randread) 

[global] 

debug ms = 0/0 


- compile qemu with jemalloc (--enable-jemalloc) 
[ https://lists.gnu.org/archive/html/qemu-devel/2015-06/msg05265.html | https://lists.gnu.org/archive/html/qemu-devel/2015-06/msg05265.html ] 



----- Mail original ----- 
De: "Jason Dillaman" < [ mailto:jdillama@xxxxxxxxxx | jdillama@xxxxxxxxxx ] > 
À: "Bill WONG" < [ mailto:wongahshuen@xxxxxxxxx | wongahshuen@xxxxxxxxx ] > 
Cc: "aderumier" < [ mailto:aderumier@xxxxxxxxx | aderumier@xxxxxxxxx ] >, "ceph-users" < [ mailto:ceph-users@xxxxxxxxxxxxxx | ceph-users@xxxxxxxxxxxxxx ] > 
Envoyé: Mardi 1 Novembre 2016 02:06:22 
Objet: Re:  RBD Block performance vs rbd mount as filesystem 

For better or worse, I can repeat your "ioping" findings against a 
qcow2 image hosted on a krbd-backed volume. The "bad" news is that it 
actually isn't even sending any data to the OSDs -- which is why your 
latency is shockingly low. When performing a "dd ... oflag=dsync" 
against the krbd-backed qcow2 image, I can see lots of IO being 
coalesced from 4K writes into larger writes, which is artificially 
inflating the stats. 



On Mon, Oct 31, 2016 at 11:08 AM, Bill WONG < [ mailto:wongahshuen@xxxxxxxxx | wongahshuen@xxxxxxxxx ] > wrote: 
> Hi Jason, 
> 
> it looks the situation is the same, no difference. my ceph.conf is below, 
> any comments or improvement required? 
> --- 
> [global] 
> fsid = 106a12b0-5ed0-4a71-b6aa-68a09088ec33 
> mon_initial_members = ceph-mon1, ceph-mon2, ceph-mon3 
> mon_host = 192.168.8.11,192.168.8.12,192.168.8.13 
> auth_cluster_required = cephx 
> auth_service_required = cephx 
> auth_client_required = cephx 
> filestore_xattr_use_omap = true 
> osd pool default size = 3 
> osd pool default min size = 1 
> osd pool default pg num = 4096 
> osd pool default pgp num = 4096 
> osd_crush_chooseleaf_type = 1 
> mon_pg_warn_max_per_osd = 0 
> max_open_files = 131072 
> 
> [mon] 
> mon_data = /var/lib/ceph/mon/ceph-$id 
> 
> mon clock drift allowed = 2 
> mon clock drift warn backoff = 30 
> 
> [osd] 
> osd_data = /var/lib/ceph/osd/ceph-$id 
> osd_journal_size = 20000 
> osd_mkfs_type = xfs 
> osd_mkfs_options_xfs = -f 
> filestore_xattr_use_omap = true 
> filestore_min_sync_interval = 10 
> filestore_max_sync_interval = 15 
> filestore_queue_max_ops = 25000 
> filestore_queue_max_bytes = 10485760 
> filestore_queue_committing_max_ops = 5000 
> filestore_queue_committing_max_bytes = 10485760000 
> journal_max_write_bytes = 1073714824 
> journal_max_write_entries = 10000 
> journal_queue_max_ops = 50000 
> journal_queue_max_bytes = 10485760000 
> osd_max_write_size = 512 
> osd_client_message_size_cap = 2147483648 
> osd_deep_scrub_stride = 131072 
> osd_op_threads = 8 
> osd_disk_threads = 4 
> osd_map_cache_size = 1024 
> osd_map_cache_bl_size = 128 
> osd_mount_options_xfs = "rw,noexec,nodev,noatime,nodiratime,nobarrier" 
> osd_recovery_op_priority = 4 
> osd_recovery_max_active = 10 
> osd_max_backfills = 4 
> rbd non blocking aio = false 
> 
> [client] 
> rbd_cache = true 
> rbd_cache_size = 268435456 
> rbd_cache_max_dirty = 134217728 
> rbd_cache_max_dirty_age = 5 
> --- 
> 
> 
> 
> On Mon, Oct 31, 2016 at 9:20 PM, Jason Dillaman < [ mailto:jdillama@xxxxxxxxxx | jdillama@xxxxxxxxxx ] > wrote: 
>> 
>> On Sun, Oct 30, 2016 at 5:40 AM, Bill WONG < [ mailto:wongahshuen@xxxxxxxxx | wongahshuen@xxxxxxxxx ] > wrote: 
>> > any ideas or comments? 
>> 
>> Can you set "rbd non blocking aio = false" in your ceph.conf and retry 
>> librbd? This will eliminate at least one context switch on the read IO 
>> path -- which result in increased latency under extremely low queue 
>> depths. 
>> 
>> -- 
>> Jason 
> 
> 



-- 
Jason 





_______________________________________________ 
ceph-users mailing list 
ceph-users@xxxxxxxxxxxxxx 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com