Re: RBD Block performance vs rbd mount as filesystem

Alexandre DERUMIER <aderumier@xxxxxxxxx> · Tue, 8 Nov 2016 07:50:59 +0100 (CET)

>>if i complied the ceph from soruce, then i cannot use ceph-deploy to install the cluster, everything need to handle from myself. as i am running CentOS 7, and it looks ceph suggest to >>use ceph-deploy to deploy the cluster. is there any pre-complied or enable --with-jemalloc by default package ? 

you can use ceph-deploy to deploy first time with ceph repo, then reinstall jemmaloc package on top.

or you can build your own repository, and tell to ceph deploy install to use it
ceph-deploy install --repo-url http://my.repo.com/debian-jewel/


But yes, you'll need to build packages manually each time ceph will release a new version.

(I'm still hoping to have some day an official ceph-jemalloc repo )


----- Mail original -----
De: "Bill WONG" <wongahshuen@xxxxxxxxx>
À: "aderumier" <aderumier@xxxxxxxxx>
Cc: "dillaman" <dillaman@xxxxxxxxxx>, "ceph-users" <ceph-users@xxxxxxxxxxxxxx>
Envoyé: Lundi 7 Novembre 2016 11:28:02
Objet: Re:  RBD Block performance vs rbd mount as filesystem

Hi Alexandre, 
if i complied the ceph from soruce, then i cannot use ceph-deploy to install the cluster, everything need to handle from myself. as i am running CentOS 7, and it looks ceph suggest to use ceph-deploy to deploy the cluster. is there any pre-complied or enable --with-jemalloc by default package ? 


On Mon, Nov 7, 2016 at 2:46 PM, Alexandre DERUMIER < [ mailto:aderumier@xxxxxxxxx | aderumier@xxxxxxxxx ] > wrote: 


>>any document can provided for how i can complied ceph with jemalloc as well? as it looks if ceph with jemalloc is much better performance too. 

simply build ceph with --with-jemalloc (I'm seeing improvements on really high iops, something like 300k iops, tcmalloc is limiting, and with jemalloc I'm around 450k iops) 


here my debian package rules change: 

iff --git a/debian/control b/debian/control 
index 3e03689..ab23b3b 100644 
--- a/debian/control 
+++ b/debian/control 
@@ -38,7 +38,6 @@ Build-Depends: autoconf, 
libexpat1-dev, 
libfcgi-dev, 
libfuse-dev, 
- libgoogle-perftools-dev [i386 amd64 arm64], 
libkeyutils-dev, 
libleveldb-dev, 
libnss3-dev,

diff --git a/debian/rules b/debian/rules 
index b705dd6..7db5b9a 100755 
--- a/debian/rules 
+++ b/debian/rules 
@@ -23,7 +23,7 @@ export DEB_HOST_ARCH ?= $(shell dpkg-architecture -qDEB_HOST_ARCH) 
extraopts += --with-ocf --with-nss 
extraopts += --with-debug 
extraopts += --enable-cephfs-java 
- 
+extraopts += --with-jemalloc 
# rocksdb is not packaged by anyone. build it if we can. 
extraopts += --with-librocksdb-static=check 


>>and what's the side effect if debug ms=0/0 

I don't see any side effect. you'll don't have debug information. (but as your are in production, it shouldn't be a problem) 


>>and it looks disable cephx auth is no good for production use.... cephx affect lot of performance? 

for me, I still have 10-20% difference with cephx. 
If you only use your ceph cluster for your qemu cluster, I don't see any problem to disable it. 
(and of course your ceph cluster is firewalled / or network access is only available for your qemu client). 

Note that changing it only is not possible. so you need to shutdown all the clients before doing this change. 



----- Mail original ----- 
De: "Bill WONG" < [ mailto:wongahshuen@xxxxxxxxx | wongahshuen@xxxxxxxxx ] > 
À: "aderumier" < [ mailto:aderumier@xxxxxxxxx | aderumier@xxxxxxxxx ] > 
Cc: "dillaman" < [ mailto:dillaman@xxxxxxxxxx | dillaman@xxxxxxxxxx ] >, "ceph-users" < [ mailto:ceph-users@xxxxxxxxxxxxxx | ceph-users@xxxxxxxxxxxxxx ] > 
Envoyé: Lundi 7 Novembre 2016 06:35:38 
Objet: Re:  RBD Block performance vs rbd mount as filesystem 

HI Alexandre, 
thank you! 
any document can provided for how i can complied ceph with jemalloc as well? as it looks if ceph with jemalloc is much better performance too. 
and what's the side effect if debug ms=0/0 and it looks disable cephx auth is no good for production use.... cephx affect lot of performance? 


On Sat, Nov 5, 2016 at 5:55 PM, Alexandre DERUMIER < [ mailto: [ mailto:aderumier@xxxxxxxxx | aderumier@xxxxxxxxx ] | [ mailto:aderumier@xxxxxxxxx | aderumier@xxxxxxxxx ] ] > wrote: 


here some tips I use to improve librbd performance && qemu: 

- disabling cephx auth 

- disable debug_ms : (I'm jumping from 30k iops to 45k iops, with 4k randread) 

[global] 

debug ms = 0/0 


- compile qemu with jemalloc (--enable-jemalloc) 
[ [ https://lists.gnu.org/archive/html/qemu-devel/2015-06/msg05265.html | https://lists.gnu.org/archive/html/qemu-devel/2015-06/msg05265.html ] | [ https://lists.gnu.org/archive/html/qemu-devel/2015-06/msg05265.html | https://lists.gnu.org/archive/html/qemu-devel/2015-06/msg05265.html ] ] 



----- Mail original ----- 
De: "Jason Dillaman" < [ mailto: [ mailto:jdillama@xxxxxxxxxx | jdillama@xxxxxxxxxx ] | [ mailto:jdillama@xxxxxxxxxx | jdillama@xxxxxxxxxx ] ] > 
À: "Bill WONG" < [ mailto: [ mailto:wongahshuen@xxxxxxxxx | wongahshuen@xxxxxxxxx ] | [ mailto:wongahshuen@xxxxxxxxx | wongahshuen@xxxxxxxxx ] ] > 
Cc: "aderumier" < [ mailto: [ mailto:aderumier@xxxxxxxxx | aderumier@xxxxxxxxx ] | [ mailto:aderumier@xxxxxxxxx | aderumier@xxxxxxxxx ] ] >, "ceph-users" < [ mailto: [ mailto:ceph-users@xxxxxxxxxxxxxx | ceph-users@xxxxxxxxxxxxxx ] | [ mailto:ceph-users@xxxxxxxxxxxxxx | ceph-users@xxxxxxxxxxxxxx ] ] > 
Envoyé: Mardi 1 Novembre 2016 02:06:22 
Objet: Re:  RBD Block performance vs rbd mount as filesystem 

For better or worse, I can repeat your "ioping" findings against a 
qcow2 image hosted on a krbd-backed volume. The "bad" news is that it 
actually isn't even sending any data to the OSDs -- which is why your 
latency is shockingly low. When performing a "dd ... oflag=dsync" 
against the krbd-backed qcow2 image, I can see lots of IO being 
coalesced from 4K writes into larger writes, which is artificially 
inflating the stats. 



On Mon, Oct 31, 2016 at 11:08 AM, Bill WONG < [ mailto: [ mailto:wongahshuen@xxxxxxxxx | wongahshuen@xxxxxxxxx ] | [ mailto:wongahshuen@xxxxxxxxx | wongahshuen@xxxxxxxxx ] ] > wrote: 
> Hi Jason, 
> 
> it looks the situation is the same, no difference. my ceph.conf is below, 
> any comments or improvement required? 
> --- 
> [global] 
> fsid = 106a12b0-5ed0-4a71-b6aa-68a09088ec33 
> mon_initial_members = ceph-mon1, ceph-mon2, ceph-mon3 
> mon_host = 192.168.8.11,192.168.8.12,192.168.8.13 
> auth_cluster_required = cephx 
> auth_service_required = cephx 
> auth_client_required = cephx 
> filestore_xattr_use_omap = true 
> osd pool default size = 3 
> osd pool default min size = 1 
> osd pool default pg num = 4096 
> osd pool default pgp num = 4096 
> osd_crush_chooseleaf_type = 1 
> mon_pg_warn_max_per_osd = 0 
> max_open_files = 131072 
> 
> [mon] 
> mon_data = /var/lib/ceph/mon/ceph-$id 
> 
> mon clock drift allowed = 2 
> mon clock drift warn backoff = 30 
> 
> [osd] 
> osd_data = /var/lib/ceph/osd/ceph-$id 
> osd_journal_size = 20000 
> osd_mkfs_type = xfs 
> osd_mkfs_options_xfs = -f 
> filestore_xattr_use_omap = true 
> filestore_min_sync_interval = 10 
> filestore_max_sync_interval = 15 
> filestore_queue_max_ops = 25000 
> filestore_queue_max_bytes = 10485760 
> filestore_queue_committing_max_ops = 5000 
> filestore_queue_committing_max_bytes = 10485760000 
> journal_max_write_bytes = 1073714824 
> journal_max_write_entries = 10000 
> journal_queue_max_ops = 50000 
> journal_queue_max_bytes = 10485760000 
> osd_max_write_size = 512 
> osd_client_message_size_cap = 2147483648 
> osd_deep_scrub_stride = 131072 
> osd_op_threads = 8 
> osd_disk_threads = 4 
> osd_map_cache_size = 1024 
> osd_map_cache_bl_size = 128 
> osd_mount_options_xfs = "rw,noexec,nodev,noatime,nodiratime,nobarrier" 
> osd_recovery_op_priority = 4 
> osd_recovery_max_active = 10 
> osd_max_backfills = 4 
> rbd non blocking aio = false 
> 
> [client] 
> rbd_cache = true 
> rbd_cache_size = 268435456 
> rbd_cache_max_dirty = 134217728 
> rbd_cache_max_dirty_age = 5 
> --- 
> 
> 
> 
> On Mon, Oct 31, 2016 at 9:20 PM, Jason Dillaman < [ mailto: [ mailto:jdillama@xxxxxxxxxx | jdillama@xxxxxxxxxx ] | [ mailto:jdillama@xxxxxxxxxx | jdillama@xxxxxxxxxx ] ] > wrote: 
>> 
>> On Sun, Oct 30, 2016 at 5:40 AM, Bill WONG < [ mailto: [ mailto:wongahshuen@xxxxxxxxx | wongahshuen@xxxxxxxxx ] | [ mailto:wongahshuen@xxxxxxxxx | wongahshuen@xxxxxxxxx ] ] > wrote: 
>> > any ideas or comments? 
>> 
>> Can you set "rbd non blocking aio = false" in your ceph.conf and retry 
>> librbd? This will eliminate at least one context switch on the read IO 
>> path -- which result in increased latency under extremely low queue 
>> depths. 
>> 
>> -- 
>> Jason 
> 
> 



-- 
Jason 









_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com