Re: very different performance on two volumes in the same pool #2

Alexandre DERUMIER <aderumier@xxxxxxxxx> · Mon, 11 May 2015 18:29:55 +0200 (CEST)

Hi,
I'm currently doing benchmark too, and I don't see this behavior

>>I get very nice performance of up to 200k IOPS. However once the volume is
>>written to (ie when I map it using rbd map and dd whole volume with some random data),
>>and repeat the benchmark, random performance drops to ~23k IOPS.

I can reach 200k iops with 1 osd,with datas inside the osd,and data are in buffer of osd.

osd cpu : 60% of 2x10cores 3,1ghz
fio-rbd cpu  :40% of 2x10 cores 3,1ghz

(So I'm not sure about performance with only 1? quad core)

When datas are read from osd, I can reach around 60k iops by ssd on intel s3500
(with disabling readahead 
echo 0 > /sys/class/block/sdX/queue/read_ahead_kb)

here my ceph.conf
-----------------
auth_cluster_required = none
auth_service_required = none
auth_client_required = none
filestore_xattr_use_omap = true
osd_pool_default_min_size = 1
debug_lockdep = 0/0
debug_context = 0/0
debug_crush = 0/0
debug_buffer = 0/0
debug_timer = 0/0
debug_journaler = 0/0
debug_osd = 0/0
debug_optracker = 0/0
debug_objclass = 0/0
debug_filestore = 0/0
debug_journal = 0/0
debug_ms = 0/0
debug_monc = 0/0
debug_tp = 0/0
debug_auth = 0/0
debug_finisher = 0/0
debug_heartbeatmap = 0/0
debug_perfcounter = 0/0
debug_asok = 0/0
debug_throttle = 0/0
osd_op_threads = 5
filestore_op_threads = 4
osd_op_num_threads_per_shard = 2
osd_op_num_shards = 10
filestore_fd_cache_size = 64
filestore_fd_cache_shards = 32
ms_nocrc = true
ms_dispatch_throttle_bytes = 0
cephx_sign_messages = false
cephx_require_signatures = false
throttler_perf_counter = false
ms_crc_header = false
ms_crc_data = false

[osd]
osd_client_message_size_cap = 0
osd_client_message_cap = 0
osd_enable_op_tracker = false

[client]
rbd_cache = false

----- Mail original -----
De: "Nikola Ciprich" <nikola.ciprich@xxxxxxxxxxx>
À: "ceph-users" <ceph-users@xxxxxxxxxxxxxx>
Cc: nik@xxxxxxxxxxx
Envoyé: Lundi 11 Mai 2015 06:43:04
Objet:  very different performance on two volumes in the same	pool #2

Hello ceph developers and users, 

some time ago, I posted here a question regarding very different 
performance for two volumes in one pool (backed by SSD drives). 

After some examination, I probably got to the root of the problem.. 

When I create fresh volume (ie rbd create --image-format 2 --size 51200 ssd/test) 
and run random io fio benchmark 

fio --randrepeat=1 --ioengine=rbd --direct=1 --gtod_reduce=1 --name=test --pool=ssd3r --rbdname=${rbdname} --invalidate=1 --bs=4k --iodepth=64 --readwrite=randread 

I get very nice performance of up to 200k IOPS. However once the volume is 
written to (ie when I map it using rbd map and dd whole volume with some random data), 
and repeat the benchmark, random performance drops to ~23k IOPS. 

This leads me to conjecture that for unwritten (sparse) volumes, read 
is just a noop, simply returning zeroes without really having to read 
data from physical storage, and thus showing nice performance, but once 
the volume is written, performance drops due to need to physically read the 
data, right? 

However I'm a bit unhappy about the performance drop, the pool is backed 
by 3 SSD drives (each having random io performance of 100k iops) on three 
nodes, and object size is set to 3. Cluster is completely idle, nodes 
are quad core Xeons E3-1220 v3 @ 3.10GHz, 32GB RAM each, centos 6, kernel 3.18.12, 
ceph 0.94.1. I'm using libtcmalloc (I even tried upgrading gperftools-libs to 2.4) 
Nodes are connected using 10gb ethernet, with jumbo frames enabled. 

I tried tuning following values: 

osd_op_threads = 5 
filestore_op_threads = 4 
osd_op_num_threads_per_shard = 1 
osd_op_num_shards = 25 
filestore_fd_cache_size = 64 
filestore_fd_cache_shards = 32 

I don't see anything special in perf: 

5.43% [kernel] [k] acpi_processor_ffh_cstate_enter 
2.93% libtcmalloc.so.4.2.6 [.] 0x0000000000017d2c 
2.45% libpthread-2.12.so [.] pthread_mutex_lock 
2.37% libpthread-2.12.so [.] pthread_mutex_unlock 
2.33% [kernel] [k] do_raw_spin_lock 
2.00% libsoftokn3.so [.] 0x000000000001f455 
1.96% [kernel] [k] __switch_to 
1.32% [kernel] [k] __schedule 
1.24% libstdc++.so.6.0.13 [.] std::basic_ostream<char, std::char_traits<char> >& std::__ostream_insert<char, std::char_traits<char> >(std::basic_ostream<char, std::char 
1.24% libc-2.12.so [.] memcpy 
1.19% libtcmalloc.so.4.2.6 [.] operator delete(void*) 
1.16% [kernel] [k] __d_lookup_rcu 
1.09% libstdc++.so.6.0.13 [.] 0x000000000007d6be 
0.93% libstdc++.so.6.0.13 [.] std::basic_streambuf<char, std::char_traits<char> >::xsputn(char const*, long) 
0.93% ceph-osd [.] crush_hash32_3 
0.85% libc-2.12.so [.] vfprintf 
0.84% libc-2.12.so [.] __strlen_sse42 
0.80% [kernel] [k] get_futex_key_refs 
0.80% libpthread-2.12.so [.] pthread_mutex_trylock 
0.78% libtcmalloc.so.4.2.6 [.] tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::FreeList*, unsigned long, int) 
0.71% libstdc++.so.6.0.13 [.] std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string(std::string const&) 
0.68% ceph-osd [.] ceph::log::Log::flush() 
0.66% libtcmalloc.so.4.2.6 [.] tc_free 
0.63% [kernel] [k] resched_curr 
0.63% [kernel] [k] page_fault 
0.62% libstdc++.so.6.0.13 [.] std::string::reserve(unsigned long) 

I'm running benchmark directly on one of nodes, which I know is not optimal, 
but it's still able to give those 200k iops for empty volume, so I guess it 
shouldn't be problem.. 

Another story is random write performance, which is totally poor, but I't like 
to deal with read performance first.. 

so my question is, are those numbers normal? If not, what should I check? 

I'll be very grateful for all the hints I could get.. 

thanks a lot in advance 

nik 

-- 
------------------------------------- 
Ing. Nikola CIPRICH 
LinuxBox.cz, s.r.o. 
28.rijna 168, 709 00 Ostrava 

tel.: +420 591 166 214 
fax: +420 596 621 273 
mobil: +420 777 093 799 
www.linuxbox.cz 

mobil servis: +420 737 238 656 
email servis: servis@xxxxxxxxxxx 
------------------------------------- 

_______________________________________________ 
ceph-users mailing list 
ceph-users@xxxxxxxxxxxxxx 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com