Re: [Cbt] client fio-rbd benchmark : debian wheezy vs ubuntu vivid : big difference

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



>>Maybe you can use perf to find the worst offending hotspots and place to start? 

Already done some months ago (fio-rbd - debian wheezy), 
http://tracker.ceph.com/issues/10139

But I'll try to update it with my new results on jessie.


----- Mail original -----
De: "Milosz Tanski" <milosz@xxxxxxxxx>
À: "aderumier" <aderumier@xxxxxxxxx>
Cc: "Stefan Priebe" <s.priebe@xxxxxxxxxxxx>, "cbt" <cbt@xxxxxxxx>, "ceph-devel" <ceph-devel@xxxxxxxxxxxxxxx>
Envoyé: Mardi 12 Mai 2015 16:37:38
Objet: Re: [Cbt] client fio-rbd benchmark : debian wheezy vs ubuntu vivid : big difference

On Tue, May 12, 2015 at 4:17 AM, Alexandre DERUMIER <aderumier@xxxxxxxxx> wrote: 
>>>Sounds good. Any reason for not switching to tcmalloc by default in PVE? 
> 
> I'm currently benching it inside qemu, but I don't see too much improvements 
> 
> I'm around 30000iops by virtio disk, glibc or tcmalloc. (don't known if jemmaloc works fine with qemu) 

I'm going to guess that there's a whole slew of stuff that happens 
between qemu and the guest that results in a lower bound for iops. 

> 
> 
> I don't known if all this memory allocations call could be reduce in librbd/librados ? 
> 

Maybe you can use perf to find the worst offending hotspots and place to start? 

> 
> 
> ----- Mail original ----- 
> De: "Stefan Priebe" <s.priebe@xxxxxxxxxxxx> 
> À: "aderumier" <aderumier@xxxxxxxxx>, "Milosz Tanski" <milosz@xxxxxxxxx> 
> Cc: "cbt" <cbt@xxxxxxxx>, "ceph-devel" <ceph-devel@xxxxxxxxxxxxxxx> 
> Envoyé: Mardi 12 Mai 2015 08:12:08 
> Objet: Re: [Cbt] client fio-rbd benchmark : debian wheezy vs ubuntu vivid : big difference 
> 
> Am 12.05.2015 um 02:34 schrieb Alexandre DERUMIER: 
>>>> ou can try it and see if it'll make a difference. Set LD_PRELOAD to 
>>>> include the so of jemalloc / tcmalloc before starting FIO. Like this: 
>>>> 
>>>> $ export LD_PRELOAD=${JEMALLOC_PATH}/lib/libjemalloc.so.1 
>>>> $ ./run_test.sh 
>> 
>> Thanks it's working. 
>> 
>> Seem that jemmaloc with fio-rbd give 17% iops improvement and reduce latencies and cpu usage ! 
>> 
>> results with 1 numjob: 
>> 
>> glibc : iops=36668 usr=62.23%, sys=12.13% 
>> libtcmalloc : iops=36105 usr=63.54%, sys=8.45% 
>> jemalloc: iops=43181 usr=60.91%, sys=10.51% 
>> 
>> 
>> (with 10numjobs, i'm around 240k iops with jemalloc vs 220k iops with glibc/tcmalloc) 
>> 
>> 
>> I just found a qemu git a patch to enable tcmalloc 
>> http://git.qemu.org/?p=qemu.git;a=commitdiff;h=2847b46958ab0bd604e1b3fcafba0f5ba4375833 
>> I'll try to test it to see if it's help 
> 
> Sounds good. Any reason for not switching to tcmalloc by default in PVE? 
> 
> Stefan 
> 
>> 
>> 
>> 
>> 
>> 
>> 
>> fio results 
>> ------------ 
>> 
>> glibc 
>> ----- 
>> Jobs: 1 (f=1): [r(1)] [100.0% done] [123.9MB/0KB/0KB /s] [31.8K/0/0 iops] [eta 00m:00s] 
>> rbd_iodepth32-test: (groupid=0, jobs=1): err= 0: pid=7239: Tue May 12 02:05:46 2015 
>> read : io=30000MB, bw=146675KB/s, iops=36668, runt=209443msec 
>> slat (usec): min=8, max=1245, avg=26.07, stdev=13.99 
>> clat (usec): min=107, max=4752, avg=525.40, stdev=207.46 
>> lat (usec): min=126, max=4767, avg=551.47, stdev=208.27 
>> clat percentiles (usec): 
>> | 1.00th=[ 171], 5.00th=[ 215], 10.00th=[ 253], 20.00th=[ 322], 
>> | 30.00th=[ 386], 40.00th=[ 450], 50.00th=[ 516], 60.00th=[ 588], 
>> | 70.00th=[ 652], 80.00th=[ 716], 90.00th=[ 796], 95.00th=[ 868], 
>> | 99.00th=[ 996], 99.50th=[ 1048], 99.90th=[ 1192], 99.95th=[ 1240], 
>> | 99.99th=[ 1368] 
>> bw (KB /s): min=112328, max=176848, per=100.00%, avg=146768.86, stdev=12974.09 
>> lat (usec) : 250=9.61%, 500=37.58%, 750=37.25%, 1000=14.60% 
>> lat (msec) : 2=0.96%, 4=0.01%, 10=0.01% 
>> cpu : usr=62.23%, sys=12.13%, ctx=10008821, majf=0, minf=1348 
>> IO depths : 1=0.1%, 2=0.1%, 4=3.0%, 8=28.8%, 16=64.2%, 32=4.0%, >=64=0.0% 
>> submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% 
>> complete : 0=0.0%, 4=96.1%, 8=0.1%, 16=0.1%, 32=3.9%, 64=0.0%, >=64=0.0% 
>> issued : total=r=7680000/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0 
>> latency : target=0, window=0, percentile=100.00%, depth=32 
>> 
>> Run status group 0 (all jobs): 
>> READ: io=30000MB, aggrb=146674KB/s, minb=146674KB/s, maxb=146674KB/s, mint=209443msec, maxt=209443msec 
>> 
>> Disk stats (read/write): 
>> sdb: ios=0/22, merge=0/13, ticks=0/0, in_queue=0, util=0.00% 
>> 
>> 
>> jemmaloc 
>> -------- 
>> Jobs: 1 (f=1): [r(1)] [100.0% done] [165.4MB/0KB/0KB /s] [42.3K/0/0 iops] [eta 00m:00s] 
>> rbd_iodepth32-test: (groupid=0, jobs=1): err= 0: pid=7137: Tue May 12 02:01:25 2015 
>> read : io=30000MB, bw=172726KB/s, iops=43181, runt=177854msec 
>> slat (usec): min=6, max=563, avg=22.28, stdev=14.68 
>> clat (usec): min=95, max=3559, avg=456.29, stdev=168.37 
>> lat (usec): min=110, max=3579, avg=478.56, stdev=169.06 
>> clat percentiles (usec): 
>> | 1.00th=[ 161], 5.00th=[ 201], 10.00th=[ 233], 20.00th=[ 290], 
>> | 30.00th=[ 346], 40.00th=[ 402], 50.00th=[ 454], 60.00th=[ 506], 
>> | 70.00th=[ 556], 80.00th=[ 612], 90.00th=[ 676], 95.00th=[ 732], 
>> | 99.00th=[ 844], 99.50th=[ 900], 99.90th=[ 1020], 99.95th=[ 1064], 
>> | 99.99th=[ 1192] 
>> bw (KB /s): min=129936, max=199712, per=100.00%, avg=172822.83, stdev=11812.99 
>> lat (usec) : 100=0.01%, 250=12.77%, 500=45.87%, 750=37.60%, 1000=3.62% 
>> lat (msec) : 2=0.13%, 4=0.01% 
>> cpu : usr=60.91%, sys=10.51%, ctx=9329053, majf=0, minf=1687 
>> IO depths : 1=0.1%, 2=0.1%, 4=1.8%, 8=26.4%, 16=67.5%, 32=4.2%, >=64=0.0% 
>> submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% 
>> complete : 0=0.0%, 4=95.9%, 8=0.1%, 16=0.1%, 32=4.0%, 64=0.0%, >=64=0.0% 
>> issued : total=r=7680000/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0 
>> latency : target=0, window=0, percentile=100.00%, depth=32 
>> 
>> Run status group 0 (all jobs): 
>> READ: io=30000MB, aggrb=172725KB/s, minb=172725KB/s, maxb=172725KB/s, mint=177854msec, maxt=177854msec 
>> 
>> Disk stats (read/write): 
>> sdb: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00% 
>> 
>> 
>> libtcmalloc 
>> ------------ 
>> rbd engine: RBD version: 0.1.10 
>> Jobs: 1 (f=1): [r(1)] [100.0% done] [140.1MB/0KB/0KB /s] [35.9K/0/0 iops] [eta 00m:00s] 
>> rbd_iodepth32-test: (groupid=0, jobs=1): err= 0: pid=7039: Tue May 12 01:57:41 2015 
>> read : io=30000MB, bw=144423KB/s, iops=36105, runt=212708msec 
>> slat (usec): min=10, max=803, avg=26.65, stdev=17.68 
>> clat (usec): min=54, max=5052, avg=530.82, stdev=216.05 
>> lat (usec): min=114, max=5531, avg=557.46, stdev=217.22 
>> clat percentiles (usec): 
>> | 1.00th=[ 169], 5.00th=[ 213], 10.00th=[ 251], 20.00th=[ 322], 
>> | 30.00th=[ 386], 40.00th=[ 454], 50.00th=[ 524], 60.00th=[ 596], 
>> | 70.00th=[ 660], 80.00th=[ 724], 90.00th=[ 804], 95.00th=[ 876], 
>> | 99.00th=[ 1048], 99.50th=[ 1128], 99.90th=[ 1336], 99.95th=[ 1464], 
>> | 99.99th=[ 2256] 
>> bw (KB /s): min=60416, max=161496, per=100.00%, avg=144529.50, stdev=10827.54 
>> lat (usec) : 100=0.01%, 250=9.88%, 500=36.69%, 750=36.97%, 1000=14.88% 
>> lat (msec) : 2=1.57%, 4=0.01%, 10=0.01% 
>> cpu : usr=63.54%, sys=8.45%, ctx=9209514, majf=0, minf=2120 
>> IO depths : 1=0.1%, 2=0.1%, 4=3.0%, 8=28.9%, 16=64.0%, 32=4.0%, >=64=0.0% 
>> submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% 
>> complete : 0=0.0%, 4=96.1%, 8=0.1%, 16=0.1%, 32=3.8%, 64=0.0%, >=64=0.0% 
>> issued : total=r=7680000/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0 
>> latency : target=0, window=0, percentile=100.00%, depth=32 
>> 
>> 
>> 
>> 
>> 
>> ----- Mail original ----- 
>> De: "Milosz Tanski" <milosz@xxxxxxxxx> 
>> À: "aderumier" <aderumier@xxxxxxxxx> 
>> Cc: "Stefan Priebe" <s.priebe@xxxxxxxxxxxx>, "cbt" <cbt@xxxxxxxx>, "ceph-devel" <ceph-devel@xxxxxxxxxxxxxxx> 
>> Envoyé: Lundi 11 Mai 2015 23:38:51 
>> Objet: Re: [Cbt] client fio-rbd benchmark : debian wheezy vs ubuntu vivid : big difference 
>> 
>> On Mon, May 11, 2015 at 10:20 AM, Alexandre DERUMIER 
>> <aderumier@xxxxxxxxx> wrote: 
>>>>> That's pretty interesting. I wasn't aware that there were performance 
>>>>> optimisations in glibc. 
>>>>> 
>>>>> As you have a test setup. Is it possible to install jessie libc on wheezy? 
>>> 
>>> mmm, I can try that. Not sure it'll work. 
>>> 
>>> 
>>> BTW, librbd cpu usage is always 3x-4x more than KRBD. 
>>> a lot of cpu is used from malloc/free. It could be great to optimise that. 
>>> 
>>> I don't known if jemmaloc or tcmalloc could be used, like for osd daemons ? 
>> 
>> You can try it and see if it'll make a difference. Set LD_PRELOAD to 
>> include the so of jemalloc / tcmalloc before starting FIO. Like this: 
>> 
>> $ export LD_PRELOAD=${JEMALLOC_PATH}/lib/libjemalloc.so.1 
>> $ ./run_test.sh 
>> 
>> As a matter of policy, libraries shouldn't force a particular malloc 
>> implementation on the users of a particular library. It might go 
>> against the user's wishes, not to mention what conflicts would happen 
>> if one library wanted / needed jamalloc while another one wanted / 
>> needed tcmalloc. 
>> 
>>> 
>>> 
>>> Reducing cpu usage could improve a lot qemu performance, as qemu use only 1 thread by disk. 
>>> 
>>> 
>>> 
>>> ----- Mail original ----- 
>>> De: "Stefan Priebe" <s.priebe@xxxxxxxxxxxx> 
>>> À: "aderumier" <aderumier@xxxxxxxxx>, "cbt" <cbt@xxxxxxxx>, "ceph-devel" <ceph-devel@xxxxxxxxxxxxxxx> 
>>> Envoyé: Lundi 11 Mai 2015 12:30:03 
>>> Objet: Re: [Cbt] client fio-rbd benchmark : debian wheezy vs ubuntu vivid : big difference 
>>> 
>>> Am 11.05.2015 um 07:53 schrieb Alexandre DERUMIER: 
>>>> Seem that's is ok too on debian jessie (with an extra boost with rbd_cache true) 
>>>> 
>>>> Maybe is it related to old glibc on debian wheezy ? 
>>> 
>>> That's pretty interesting. I wasn't aware that there were performance 
>>> optimisations in glibc. 
>>> 
>>> As you have a test setup. Is it possible to install jessie libc on wheezy? 
>>> 
>>> Stefan 
>>> 
>>> 
>>>> 
>>>> debian jessie: rbd_cache=false : iops=202985 : %Cpu(s): 21,9 us, 9,5 sy, 0,0 ni, 66,1 id, 0,0 wa, 0,0 hi, 2,6 si, 0,0 st 
>>>> debian jessie: rbd_cache=true : iops=215290 : %Cpu(s): 27,9 us, 10,8 sy, 0,0 ni, 58,8 id, 0,0 wa, 0,0 hi, 2,6 si, 0,0 st 
>>>> 
>>>> 
>>>> ubuntu vivid : rbd_cache=false : iops=201089 %Cpu(s): 21,3 us, 12,8 sy, 0,0 ni, 61,8 id, 0,0 wa, 0,0 hi, 4,1 si, 0,0 st 
>>>> ubuntu vivid : rbd_cache=true : iops=197549 %Cpu(s): 27,2 us, 15,3 sy, 0,0 ni, 53,2 id, 0,0 wa, 0,0 hi, 4,2 si, 0,0 st 
>>>> debian wheezy : rbd_cache=false: iops=161272 %Cpu(s): 28.4 us, 15.4 sy, 0.0 ni, 52.8 id, 0.0 wa, 0.0 hi, 3.4 si, 0.0 st 
>>>> debian wheezy : rbd_cache=true : iops=135893 %Cpu(s): 30.0 us, 15.5 sy, 0.0 ni, 51.5 id, 0.0 wa, 0.0 hi, 3.0 si, 0.0 st 
>>>> 
>>>> 
>>>> 
>>>> jessie perf report 
>>>> ------------------ 
>>>> + 9,18% 3,75% fio libc-2.19.so [.] malloc 
>>>> + 6,76% 5,70% fio libc-2.19.so [.] _int_malloc 
>>>> + 5,83% 5,64% fio libc-2.19.so [.] _int_free 
>>>> + 5,11% 0,15% fio libpthread-2.19.so [.] __libc_recv 
>>>> + 4,81% 4,81% swapper [kernel.kallsyms] [k] intel_idle 
>>>> + 3,72% 0,37% fio libpthread-2.19.so [.] pthread_cond_broadcast@@GLIBC_2.3.2 
>>>> + 3,41% 0,04% fio libpthread-2.19.so [.] 0x000000000000efad 
>>>> + 3,31% 0,54% fio libpthread-2.19.so [.] pthread_cond_wait@@GLIBC_2.3.2 
>>>> + 3,19% 0,09% fio libpthread-2.19.so [.] __lll_unlock_wake 
>>>> + 2,52% 0,00% fio librados.so.2.0.0 [.] ceph::buffer::create_aligned(unsigned int, unsigned int) 
>>>> + 2,09% 0,08% fio libc-2.19.so [.] __posix_memalign 
>>>> + 2,04% 0,26% fio libpthread-2.19.so [.] __lll_lock_wait 
>>>> + 2,02% 0,13% fio libc-2.19.so [.] _mid_memalign 
>>>> + 1,95% 1,91% fio libc-2.19.so [.] __memcpy_sse2_unaligned 
>>>> + 1,88% 0,08% fio libc-2.19.so [.] _int_memalign 
>>>> + 1,88% 0,00% fio libc-2.19.so [.] __clone 
>>>> + 1,88% 0,00% fio libpthread-2.19.so [.] start_thread 
>>>> + 1,88% 0,12% fio fio [.] thread_main 
>>>> + 1,37% 1,37% swapper [kernel.kallsyms] [k] native_write_msr_safe 
>>>> + 1,29% 0,05% fio libc-2.19.so [.] __lll_unlock_wake_private 
>>>> + 1,24% 1,24% fio libpthread-2.19.so [.] pthread_mutex_trylock 
>>>> + 1,24% 0,29% fio libc-2.19.so [.] __lll_lock_wait_private 
>>>> + 1,19% 0,21% fio librbd.so.1.0.0 [.] std::_List_base<ceph::buffer::ptr, std::allocator<ceph::buffer::ptr> >::_M_clear() 
>>>> + 1,19% 1,19% fio libc-2.19.so [.] free 
>>>> + 1,18% 1,18% fio libc-2.19.so [.] malloc_consolidate 
>>>> + 1,14% 1,14% fio [kernel.kallsyms] [k] get_futex_key_refs.isra.13 
>>>> + 1,10% 1,10% fio [kernel.kallsyms] [k] __schedule 
>>>> + 1,00% 0,28% fio librados.so.2.0.0 [.] ceph::buffer::list::append(char const*, unsigned int) 
>>>> + 0,96% 0,00% fio librbd.so.1.0.0 [.] 0x000000000005b2e7 
>>>> + 0,96% 0,96% fio [kernel.kallsyms] [k] _raw_spin_lock 
>>>> + 0,92% 0,21% fio librados.so.2.0.0 [.] ceph::buffer::list::append(ceph::buffer::ptr const&, unsigned int, unsigned int) 
>>>> + 0,91% 0,00% fio librados.so.2.0.0 [.] 0x000000000006e6c0 
>>>> + 0,90% 0,90% swapper [kernel.kallsyms] [k] __switch_to 
>>>> + 0,89% 0,01% fio librbd.so.1.0.0 [.] 0x00000000000ce1f1 
>>>> + 0,89% 0,89% swapper [kernel.kallsyms] [k] cpu_startup_entry 
>>>> + 0,87% 0,01% fio librados.so.2.0.0 [.] 0x00000000002e3ff1 
>>>> + 0,86% 0,00% fio libc-2.19.so [.] 0x00000000000dd50d 
>>>> + 0,85% 0,85% fio [kernel.kallsyms] [k] try_to_wake_up 
>>>> + 0,83% 0,83% swapper [kernel.kallsyms] [k] __schedule 
>>>> + 0,82% 0,82% fio [kernel.kallsyms] [k] copy_user_enhanced_fast_string 
>>>> + 0,81% 0,00% fio librados.so.2.0.0 [.] 0x0000000000137abc 
>>>> + 0,80% 0,80% swapper [kernel.kallsyms] [k] menu_select 
>>>> + 0,75% 0,75% fio [kernel.kallsyms] [k] _raw_spin_lock_bh 
>>>> + 0,75% 0,75% fio [kernel.kallsyms] [k] futex_wake 
>>>> + 0,75% 0,75% fio libpthread-2.19.so [.] __pthread_mutex_unlock_usercnt 
>>>> + 0,73% 0,73% fio [kernel.kallsyms] [k] __switch_to 
>>>> + 0,70% 0,70% fio libstdc++.so.6.0.20 [.] std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string(std::string const&) 
>>>> + 0,70% 0,36% fio librados.so.2.0.0 [.] ceph::buffer::list::iterator::copy(unsigned int, char*) 
>>>> + 0,70% 0,23% fio fio [.] get_io_u 
>>>> + 0,67% 0,67% fio [kernel.kallsyms] [k] finish_task_switch 
>>>> + 0,67% 0,32% fio libpthread-2.19.so [.] pthread_rwlock_unlock 
>>>> + 0,67% 0,00% fio librados.so.2.0.0 [.] 0x00000000000cea98 
>>>> + 0,64% 0,00% fio librados.so.2.0.0 [.] 0x00000000002e3f87 
>>>> + 0,63% 0,63% fio [kernel.kallsyms] [k] futex_wait_setup 
>>>> + 0,62% 0,62% swapper [kernel.kallsyms] [k] enqueue_task_fair 
>>>> 
>>> 
>>> -- 
>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in 
>>> the body of a message to majordomo@xxxxxxxxxxxxxxx 
>>> More majordomo info at http://vger.kernel.org/majordomo-info.html 
>> 
>> 
>> 



-- 
Milosz Tanski 
CTO 
16 East 34th Street, 15th floor 
New York, NY 10016 

p: 646-253-9055 
e: milosz@xxxxxxxxx 
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux