Re: RBD Block performance vs rbd mount as filesystem

Bill WONG <wongahshuen@xxxxxxxxx> · Mon, 7 Nov 2016 18:28:02 +0800

Hi Alexandre,
if i complied the ceph from soruce, then i cannot use ceph-deploy to install the cluster, everything need to handle from myself. as i am running CentOS 7, and it looks ceph suggest to use ceph-deploy to deploy the cluster. is there any pre-complied or enable --with-jemalloc by default package ?


On Mon, Nov 7, 2016 at 2:46 PM, Alexandre DERUMIER <aderumier@xxxxxxxxx> wrote:
>>any document can provided for how i can complied ceph with jemalloc as well? as it looks if ceph with jemalloc is much better performance too.



simply build ceph with --with-jemalloc  (I'm seeing improvements on really high iops, something like 300k iops, tcmalloc is limiting, and with jemalloc I'm around 450k iops)





here my debian package rules change:



iff --git a/debian/control b/debian/control

index 3e03689..ab23b3b 100644

--- a/debian/control

+++ b/debian/control

@@ -38,7 +38,6 @@ Build-Depends: autoconf,

                libexpat1-dev,

                libfcgi-dev,

                libfuse-dev,

-               libgoogle-perftools-dev [i386 amd64 arm64],

                libkeyutils-dev,

                libleveldb-dev,

                libnss3-dev,

diff --git a/debian/rules b/debian/rules

index b705dd6..7db5b9a 100755

--- a/debian/rules

+++ b/debian/rules

@@ -23,7 +23,7 @@ export DEB_HOST_ARCH      ?= $(shell dpkg-architecture -qDEB_HOST_ARCH)

 extraopts += --with-ocf --with-nss

 extraopts += --with-debug

 extraopts += --enable-cephfs-java

-

+extraopts += --with-jemalloc

 # rocksdb is not packaged by anyone.  build it if we can.

 extraopts += --with-librocksdb-static=check





>>and what's the side effect if debug ms=0/0



I don't see any side effect. you'll don't have debug information. (but as your are in production, it shouldn't be a problem)





>>and it looks disable cephx auth is no good for production use.... cephx affect lot of performance?



for me, I still have 10-20% difference with cephx.

If you only use your ceph cluster for your qemu cluster, I don't see any problem to disable it.

(and of course your ceph cluster is firewalled / or network access is only available for your qemu client).



Note that changing it only is not possible. so you need to shutdown all the clients before doing this change.







----- Mail original -----

De: "Bill WONG" <wongahshuen@xxxxxxxxx>

À: "aderumier" <aderumier@xxxxxxxxx>

Cc: "dillaman" <dillaman@xxxxxxxxxx>, "ceph-users" <ceph-users@xxxxxxxxxxxxxx>

Envoyé: Lundi 7 Novembre 2016 06:35:38

Objet: Re:  RBD Block performance vs rbd mount as filesystem



HI Alexandre,

thank you!

any document can provided for how i can complied ceph with jemalloc as well? as it looks if ceph with jemalloc is much better performance too.

and what's the side effect if debug ms=0/0 and it looks disable cephx auth is no good for production use.... cephx affect lot of performance?





On Sat, Nov 5, 2016 at 5:55 PM, Alexandre DERUMIER < [ mailto:aderumier@xxxxxxxxx | aderumier@xxxxxxxxx ] > wrote:





here some tips I use to improve librbd performance && qemu:



- disabling cephx auth



- disable debug_ms : (I'm jumping from 30k iops to 45k iops, with 4k randread)



[global]



debug ms = 0/0





- compile qemu with jemalloc (--enable-jemalloc)

[ https://lists.gnu.org/archive/html/qemu-devel/2015-06/msg05265.html | https://lists.gnu.org/archive/html/qemu-devel/2015-06/msg05265.html ]







----- Mail original -----

De: "Jason Dillaman" < [ mailto:jdillama@xxxxxxxxxx | jdillama@xxxxxxxxxx ] >

À: "Bill WONG" < [ mailto:wongahshuen@xxxxxxxxx | wongahshuen@xxxxxxxxx ] >

Cc: "aderumier" < [ mailto:aderumier@xxxxxxxxx | aderumier@xxxxxxxxx ] >, "ceph-users" < [ mailto:ceph-users@xxxxxxxxxx.com | ceph-users@xxxxxxxxxxxxxx ] >

Envoyé: Mardi 1 Novembre 2016 02:06:22

Objet: Re:  RBD Block performance vs rbd mount as filesystem



For better or worse, I can repeat your "ioping" findings against a

qcow2 image hosted on a krbd-backed volume. The "bad" news is that it

actually isn't even sending any data to the OSDs -- which is why your

latency is shockingly low. When performing a "dd ... oflag=dsync"

against the krbd-backed qcow2 image, I can see lots of IO being

coalesced from 4K writes into larger writes, which is artificially

inflating the stats.







On Mon, Oct 31, 2016 at 11:08 AM, Bill WONG < [ mailto:wongahshuen@xxxxxxxxx | wongahshuen@xxxxxxxxx ] > wrote:

> Hi Jason,

>

> it looks the situation is the same, no difference. my ceph.conf is below,

> any comments or improvement required?

> ---

> [global]

> fsid = 106a12b0-5ed0-4a71-b6aa-68a09088ec33

> mon_initial_members = ceph-mon1, ceph-mon2, ceph-mon3

> mon_host = 192.168.8.11,192.168.8.12,192.168.8.13

> auth_cluster_required = cephx

> auth_service_required = cephx

> auth_client_required = cephx

> filestore_xattr_use_omap = true

> osd pool default size = 3

> osd pool default min size = 1

> osd pool default pg num = 4096

> osd pool default pgp num = 4096

> osd_crush_chooseleaf_type = 1

> mon_pg_warn_max_per_osd = 0

> max_open_files = 131072

>

> [mon]

> mon_data = /var/lib/ceph/mon/ceph-$id

>

> mon clock drift allowed = 2

> mon clock drift warn backoff = 30

>

> [osd]

> osd_data = /var/lib/ceph/osd/ceph-$id

> osd_journal_size = 20000

> osd_mkfs_type = xfs

> osd_mkfs_options_xfs = -f

> filestore_xattr_use_omap = true

> filestore_min_sync_interval = 10

> filestore_max_sync_interval = 15

> filestore_queue_max_ops = 25000

> filestore_queue_max_bytes = 10485760

> filestore_queue_committing_max_ops = 5000

> filestore_queue_committing_max_bytes = 10485760000

> journal_max_write_bytes = 1073714824

> journal_max_write_entries = 10000

> journal_queue_max_ops = 50000

> journal_queue_max_bytes = 10485760000

> osd_max_write_size = 512

> osd_client_message_size_cap = 2147483648

> osd_deep_scrub_stride = 131072

> osd_op_threads = 8

> osd_disk_threads = 4

> osd_map_cache_size = 1024

> osd_map_cache_bl_size = 128

> osd_mount_options_xfs = "rw,noexec,nodev,noatime,nodiratime,nobarrier"

> osd_recovery_op_priority = 4

> osd_recovery_max_active = 10

> osd_max_backfills = 4

> rbd non blocking aio = false

>

> [client]

> rbd_cache = true

> rbd_cache_size = 268435456

> rbd_cache_max_dirty = 134217728

> rbd_cache_max_dirty_age = 5

> ---

>

>

>

> On Mon, Oct 31, 2016 at 9:20 PM, Jason Dillaman < [ mailto:jdillama@xxxxxxxxxx | jdillama@xxxxxxxxxx ] > wrote:

>>

>> On Sun, Oct 30, 2016 at 5:40 AM, Bill WONG < [ mailto:wongahshuen@xxxxxxxxx | wongahshuen@xxxxxxxxx ] > wrote:

>> > any ideas or comments?

>>

>> Can you set "rbd non blocking aio = false" in your ceph.conf and retry

>> librbd? This will eliminate at least one context switch on the read IO

>> path -- which result in increased latency under extremely low queue

>> depths.

>>

>> --

>> Jason

>

>







--

Jason













_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com