How big are those OPS? Are they random? How many nodes? How many SSDs/OSDs? What are you using to make the tests? Using atop on the OSD nodes where is your bottleneck?
On Mon, Aug 17, 2015 at 1:05 PM, Межов Игорь Александрович <megov@xxxxxxxxxx> wrote:
Hi!
We also observe the same behavior on our test Hammer install, and I wrote about it some time ago:
http://permalink.gmane.org/gmane.comp.file-systems.ceph.user/22609
Jan Schremes give us some suggestions in thread, but we still not got any positive results - TCMalloc usage is
high. The usage is lowered to <10%, when disable crc in messages, disable debug and disable cephx auth,
but this is od course not for production use. Also we got a different trace, while performin FIO-RBD benchmarks
on ssd pool:
---
46,07% [kernel] [k] _raw_spin_lock
6,51% [kernel] [k] mb_cache_entry_alloc
5,74% libtcmalloc.so.4.2.2 [.] tcmalloc::CentralFreeList::FetchFromOneSpans(int, void**, void**)
5,50% libtcmalloc.so.4.2.2 [.] tcmalloc::SLL_Next(void*)
3,86% libtcmalloc.so.4.2.2 [.] TCMalloc_PageMap3<35>::get(unsigned long) const
2,73% libtcmalloc.so.4.2.2 [.] tcmalloc::CentralFreeList::ReleaseToSpans(void*)
0,69% libtcmalloc.so.4.2.2 [.] tcmalloc::CentralFreeList::ReleaseListToSpans(void*)
0,69% libtcmalloc.so.4.2.2 [.] tcmalloc::PageHeap::GetDescriptor(unsigned long) const
0,64% libtcmalloc.so.4.2.2 [.] tcmalloc::SLL_PopRange(void**, int, void**, void**)
---
I dont clearly understand, what's happening in this case: ssd pool is connected to the same host,
but different controller (C60X onboard instead of LSI2208), io scheduler set to noop, pool is gathered
from 4х400Gb Intel DC S3700 and have to perform better, I think - more than 30-40 kops.
But we got the trace above and no more then 12-15 kiops. Where can be a problem?
Megov Igor
CIO, Yuterra
От: ceph-users <ceph-users-bounces@xxxxxxxxxxxxxx> от имени YeYin <eyniy@xxxxxx>
Отправлено: 17 августа 2015 г. 12:58
Кому: ceph-users
Тема: tcmalloc use a lot of CPUHi, all,When I do performance test with rados bench, I found tcmalloc consumed a lot of CPU:
Samples: 265K of event 'cycles', Event count (approx.): 104385445900+ 27.58% libtcmalloc.so.4.1.0 [.] tcmalloc::CentralFreeList::FetchFromSpans()+ 15.25% libtcmalloc.so.4.1.0 [.] tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::FreeList*, unsigned long,+ 12.20% libtcmalloc.so.4.1.0 [.] tcmalloc::CentralFreeList::ReleaseToSpans(void*)+ 1.63% perf [.] append_chain+ 1.39% libtcmalloc.so.4.1.0 [.] tcmalloc::CentralFreeList::ReleaseListToSpans(void*)+ 1.02% libtcmalloc.so.4.1.0 [.] tcmalloc::CentralFreeList::RemoveRange(void**, void**, int)+ 0.85% libtcmalloc.so.4.1.0 [.] 0x0000000000017e6f+ 0.75% libtcmalloc.so.4.1.0 [.] tcmalloc::ThreadCache::IncreaseCacheLimitLocked()+ 0.67% libc-2.12.so [.] memcpy+ 0.53% libtcmalloc.so.4.1.0 [.] operator delete(void*)
Ceph version:# ceph --versionceph version 0.87.2 (87a7cec9ab11c677de2ab23a7668a77d2f5b955e)
Kernel version:3.10.83
Is this phenomenon normal?Is there any idea about this problem?
Thanks.Ye
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com