Hi, I've experienced exactly the same with 14.2.5 and upgraded to 14.2.6. I'm running 7 node cluster with ~ 500 OSDs. Since the upgrade the CPU for ceph-mgr is back to normal, and ceph balancer status is responsive. However, balancing is still not working... but this is another issue. Thomas Am 16.01.2020 um 11:15 schrieb Wido den Hollander: > Anybody upgraded to 14.2.6 yet? > > On a 1800 OSD cluster I see that ceph-mgr is consuming 200 to 450% CPU > on a 4C/8T system (Intel Xeon E3-1230 3.3Ghz CPU). > > The logs don't show anything very special, it's just that the mgr is > super busy. > > I noticed this when I executed: > > $ ceph balancer status > > That command wouldn't return and then I checked the mgr. Only after > restarting ceph-mgr the balancer module returned results again. It > didn't change the CPU usage, it's still consuming a lot of CPU, but at > least the balancer seems to work again. > > Wido > > On 1/9/20 10:21 AM, Lars Täuber wrote: >> yesterday: >> https://ceph.io/releases/v14-2-6-nautilus-released/ >> >> >> Cheers, >> Lars >> >> Thu, 9 Jan 2020 10:10:12 +0100 >> Wido den Hollander <wido@xxxxxxxx> ==> Neha Ojha <nojha@xxxxxxxxxx>, Sasha Litvak <alexander.v.litvak@xxxxxxxxx> : >>> On 12/24/19 9:19 PM, Neha Ojha wrote: >>>> The root cause of this issue is the overhead added by the network ping >>>> time monitoring feature for the mgr to process. >>>> We have a fix that disables sending the network ping times related >>>> stats to the mgr and Eric has helped verify the fix(Thanks Eric!) - >>>> https://tracker.ceph.com/issues/43364#note-9. We'll get this fix out >>>> in 14.2.6 after the holidays. >>>> >>> It's after the holidays now and this is affecting a lot of deployments. >>> Can people expect 14.2.6 soon? >>> >>> Wido >>> >>>> >>>> On Fri, Dec 20, 2019 at 6:24 PM Neha Ojha <nojha@xxxxxxxxxx> wrote: >>>>> Not yet, but we have a theory and a test build in >>>>> https://tracker.ceph.com/issues/43364#note-6, if anybody would like to >>>>> give it a try. >>>>> >>>>> Thanks, >>>>> Neha >>>>> >>>>> On Fri, Dec 20, 2019 at 2:31 PM Sasha Litvak >>>>> <alexander.v.litvak@xxxxxxxxx> wrote: >>>>>> Was the root cause found and fixed? If so, will the fix be available in 14.2.6 or sooner? >>>>>> >>>>>> On Thu, Dec 19, 2019 at 5:48 PM Mark Nelson <mnelson@xxxxxxxxxx> wrote: >>>>>>> Hi Paul, >>>>>>> >>>>>>> >>>>>>> Thanks for gathering this! It looks to me like at the very least we >>>>>>> should redo the fixed_u_to_string and fixed_to_string functions in >>>>>>> common/Formatter.cc. That alone looks like it's having a pretty >>>>>>> significant impact. >>>>>>> >>>>>>> >>>>>>> Mark >>>>>>> >>>>>>> >>>>>>> On 12/19/19 2:09 PM, Paul Mezzanini wrote: >>>>>>>> Based on what we've seen with perf, we think this is the relevant section. (attached is also the whole file) >>>>>>>> >>>>>>>> Thread: 73 (mgr-fin) - 1000 samples >>>>>>>> >>>>>>>> + 100.00% clone >>>>>>>> + 100.00% start_thread >>>>>>>> + 100.00% Finisher::finisher_thread_entry() >>>>>>>> + 99.40% Context::complete(int) >>>>>>>> | + 99.40% FunctionContext::finish(int) >>>>>>>> | + 99.40% ActivePyModule::notify(std::string const&, std::string const&) >>>>>>>> | + 91.30% PyObject_CallMethod >>>>>>>> | | + 91.30% call_function_tail >>>>>>>> | | + 91.30% PyObject_Call >>>>>>>> | | + 91.30% instancemethod_call >>>>>>>> | | + 91.30% PyObject_Call >>>>>>>> | | + 91.30% function_call >>>>>>>> | | + 91.30% PyEval_EvalCodeEx >>>>>>>> | | + 88.40% PyEval_EvalFrameEx >>>>>>>> | | | + 88.40% PyEval_EvalFrameEx >>>>>>>> | | | + 88.40% ceph_state_get(BaseMgrModule*, _object*) >>>>>>>> | | | + 88.40% ActivePyModules::get_python(std::string const&) >>>>>>>> | | | + 51.10% PGMap::dump_osd_stats(ceph::Formatter*) const >>>>>>>> | | | | + 51.10% osd_stat_t::dump(ceph::Formatter*) const >>>>>>>> | | | | + 22.50% ceph::fixed_u_to_string(unsigned long, int) >>>>>>>> | | | | | + 10.50% std::basic_ostringstream<char, std::char_traits<char>, std::allocator<char> >::basic_ostringstream(std::_Ios_Openmode) >>>>>>>> | | | | | | + 9.30% std::basic_ios<char, std::char_traits<char> >::init(std::basic_streambuf<char, std::char_traits<char> >*) >>>>>>>> | | | | | | | + 7.00% std::basic_ios<char, std::char_traits<char> >::_M_cache_locale(std::locale const&) >>>>>>>> | | | | | | | | + 1.60% std::ctype<char> const& std::use_facet<std::ctype<char> >(std::locale const&) >>>>>>>> | | | | | | | | | + 1.50% __dynamic_cast >>>>>>>> | | | | | | | | | + 0.80% __cxxabiv1::__vmi_class_type_info::__do_dyncast(long, __cxxabiv1::__class_type_info::__sub_kind, __cxxabiv1::__class_type_info const*, void const*, __cxxabiv1::__class_type_info const*, void const*, __cxxabiv1::__class_type_info::__dyncast_result&) const >>>>>>>> | | | | | | | | + 1.40% bool std::has_facet<std::ctype<char> >(std::locale const&) >>>>>>>> | | | | | | | | | + 1.30% __dynamic_cast >>>>>>>> | | | | | | | | | + 0.90% __cxxabiv1::__vmi_class_type_info::__do_dyncast(long, __cxxabiv1::__class_type_info::__sub_kind, __cxxabiv1::__class_type_info const*, void const*, __cxxabiv1::__class_type_info const*, void const*, __cxxabiv1::__class_type_info::__dyncast_result&) const >>>>>>>> | | | | | | | | + 1.10% bool std::has_facet<std::num_put<char, std::ostreambuf_iterator<char, std::char_traits<char> > > >(std::locale const&) >>>>>>>> | | | | | | | | | + 0.90% __dynamic_cast >>>>>>>> | | | | | | | | + 1.00% bool std::has_facet<std::num_get<char, std::istreambuf_iterator<char, std::char_traits<char> > > >(std::locale const&) >>>>>>>> | | | | | | | | | + 0.70% __dynamic_cast >>>>>>>> | | | | | | | | | + 0.10% std::locale::id::_M_id() const >>>>>>>> | | | | | | | | | + 0.10% _ZNKSt6locale2id5_M_idEv@plt >>>>>>>> | | | | | | | | + 0.80% std::num_put<char, std::ostreambuf_iterator<char, std::char_traits<char> > > const& std::use_facet<std::num_put<char, std::ostreambuf_iterator<char, std::char_traits<char> > > >(std::locale const&) >>>>>>>> | | | | | | | | + 0.70% std::num_get<char, std::istreambuf_iterator<char, std::char_traits<char> > > const& std::use_facet<std::num_get<char, std::istreambuf_iterator<char, std::char_traits<char> > > >(std::locale const&) >>>>>>>> | | | | | | | | + 0.10% _ZSt9has_facetISt7num_putIcSt19ostreambuf_iteratorIcSt11char_traitsIcEEEEbRKSt6locale@plt >>>>>>>> | | | | | | | + 2.00% std::ios_base::_M_init() >>>>>>>> | | | | | | | | + 0.80% std::locale::operator=(std::locale const&) >>>>>>>> | | | | | | | | + 0.80% std::locale::locale() >>>>>>>> | | | | | | | | + 0.30% std::locale::~locale() >>>>>>>> | | | | | | | | + 0.10% _ZNSt6localeC1Ev@plt >>>>>>>> | | | | | | | + 0.20% _ZNSt8ios_base7_M_initEv@plt >>>>>>>> | | | | | | + 0.90% std::locale::locale() >>>>>>>> | | | | | | + 0.10% std::ios_base::ios_base() >>>>>>>> | | | | | | + 0.10% _ZNSt9basic_iosIcSt11char_traitsIcEE4initEPSt15basic_streambufIcS1_E@plt >>>>>>>> | | | | | + 2.80% std::ostream& std::ostream::_M_insert<unsigned long>(unsigned long) >>>>>>>> | | | | | | + 2.40% std::num_put<char, std::ostreambuf_iterator<char, std::char_traits<char> > >::do_put(std::ostreambuf_iterator<char, std::char_traits<char> >, std::ios_base&, char, unsigned long) const >>>>>>>> | | | | | | | + 2.10% std::ostreambuf_iterator<char, std::char_traits<char> > std::num_put<char, std::ostreambuf_iterator<char, std::char_traits<char> > >::_M_insert_int<unsigned long>(std::ostreambuf_iterator<char, std::char_traits<char> >, std::ios_base&, char, unsigned long) const >>>>>>>> | | | | | | | | + 1.60% std::basic_streambuf<char, std::char_traits<char> >::xsputn(char const*, long) >>>>>>>> | | | | | | | | | + 1.40% std::basic_stringbuf<char, std::char_traits<char>, std::allocator<char> >::overflow(int) >>>>>>>> | | | | | | | | | | + 0.90% std::string::reserve(unsigned long) >>>>>>>> | | | | | | | | | | + 0.10% std::basic_stringbuf<char, std::char_traits<char>, std::allocator<char> >::_M_sync(char*, unsigned long, unsigned long) >>>>>>>> | | | | | | | | | | + 0.10% _ZNSt15basic_stringbufIcSt11char_traitsIcESaIcEE7_M_syncEPcmm@plt >>>>>>>> | | | | | | | | | + 0.20% __memcpy_ssse3_back >>>>>>>> | | | | | | | | + 0.20% ??? >>>>>>>> | | | | | | | | + 0.10% std::num_put<char, std::ostreambuf_iterator<char, std::char_traits<char> > >::_M_pad(char, long, std::ios_base&, char*, char const*, int&) const >>>>>>>> | | | | | | | + 0.10% _ZNKSt7num_putIcSt19ostreambuf_iteratorIcSt11char_traitsIcEEE13_M_insert_intImEES3_S3_RSt8ios_basecT_@plt >>>>>>>> | | | | | | + 0.10% std::ostream::sentry::sentry(std::ostream&) >>>>>>>> | | | | | + 2.80% std::basic_stringbuf<char, std::char_traits<char>, std::allocator<char> >::str() const >>>>>>>> | | | | | | + 1.00% std::string::assign(std::string const&) >>>>>>>> | | | | | | + 0.90% char* std::string::_S_construct<char*>(char*, char*, std::allocator<char> const&, std::forward_iterator_tag) [clone .part.1796] >>>>>>>> | | | | | + 1.50% std::string::append(char const*, unsigned long) >>>>>>>> | | | | | | + 1.20% std::string::reserve(unsigned long) >>>>>>>> | | | | | | + 0.60% std::string::_Rep::_M_clone(std::allocator<char> const&, unsigned long) >>>>>>>> | | | | | | + 0.10% tc_free >>>>>>>> | | | | | + 1.20% std::string::_Rep::_M_dispose(std::allocator<char> const&) [clone .isra.97] [clone .part.98] >>>>>>>> | | | | | + 1.00% std::string::append(std::string const&) >>>>>>>> | | | | | | + 0.70% std::string::reserve(unsigned long) >>>>>>>> | | | | | | + 0.10% __memcpy_ssse3_back >>>>>>>> | | | | | + 1.00% std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string(std::string const&, unsigned long, unsigned long) >>>>>>>> | | | | | | + 0.80% char* std::string::_S_construct<char*>(char*, char*, std::allocator<char> const&, std::forward_iterator_tag) [clone .part.220] >>>>>>>> | | | | | + 0.40% std::locale::~locale() >>>>>>>> | | | | | + 0.20% tc_free >>>>>>>> | | | | | + 0.20% __strlen_sse2_pminub >>>>>>>> | | | | | + 0.10% std::ios_base::~ios_base() >>>>>>>> | | | | | + 0.10% _ZNSt8ios_baseD2Ev@plt >>>>>>>> | | | | | + 0.10% _ZNKSt15basic_stringbufIcSt11char_traitsIcESaIcEE3strEv@plt >>>>>>>> | | | | + 18.20% PyFormatter::open_object_section(char const*) >>>>>>>> | | | | | + 17.10% PyDict_New >>>>>>>> | | | | | | + 16.70% _PyObject_GC_New >>>>>>>> | | | | | | + 16.70% _PyObject_GC_Malloc >>>>>>>> | | | | | | + 16.60% collect >>>>>>>> | | | | | | | + 8.10% dict_traverse >>>>>>>> | | | | | | | | + 3.20% visit_reachable >>>>>>>> | | | | | | | | | + 0.10% type_is_gc >>>>>>>> | | | | | | | | + 2.80% visit_decref >>>>>>>> | | | | | | | | + 1.60% PyDict_Next >>>>>>>> | | | | | | | + 1.30% list_traverse >>>>>>>> | | | | | | | | + 0.40% visit_decref >>>>>>>> | | | | | | | | + 0.30% visit_reachable >>>>>>>> | | | | | | | + 0.60% func_traverse >>>>>>>> | | | | | | | + 0.40% _PyDict_MaybeUntrack >>>>>>>> | | | | | | | + 0.10% type_traverse >>>>>>>> | | | | | | | + 0.10% subtype_traverse >>>>>>>> | | | | | | | + 0.10% set_traverse >>>>>>>> | | | | | | | + 0.10% class_traverse >>>>>>>> | | | | | | | + 0.10% _PyDict_MaybeUntrack@plt >>>>>>>> | | | | | | + 0.10% PyObject_Malloc >>>>>>>> | | | | | + 1.00% PyFormatter::dump_pyobject(char const*, _object*) >>>>>>>> | | | | | + 0.40% PyString_FromString >>>>>>>> | | | | | + 0.20% dict_set_item_by_hash_or_entry >>>>>>>> | | | | | + 0.20% PyDict_SetItem >>>>>>>> | | | | | + 0.10% app1 >>>>>>>> | | | | + 6.60% ceph::Formatter::dump_format_unquoted(char const*, char const*, ...) >>>>>>>> | | | | | + 6.60% PyFormatter::dump_format_va(char const*, char const*, bool, char const*, __va_list_tag*) >>>>>>>> | | | | | + 3.90% __vsnprintf_chk >>>>>>>> | | | | | | + 3.40% vfprintf >>>>>>>> | | | | | | | + 0.50% strchrnul >>>>>>>> | | | | | | | + 0.40% __GI__IO_default_xsputn >>>>>>>> | | | | | | | + 0.20% tc_free >>>>>>>> | | | | | | | + 0.10% free@plt >>>>>>>> | | | | | | | + 0.10% (anonymous namespace)::free_null_or_invalid(void*, void (*)(void*)) [clone .constprop.41] >>>>>>>> | | | | | | + 0.20% _IO_no_init >>>>>>>> | | | | | | + 0.10% _IO_str_init_static_internal >>>>>>>> | | | | | + 1.50% PyFormatter::dump_pyobject(char const*, _object*) >>>>>>>> | | | | | | + 0.50% PyString_FromString >>>>>>>> | | | | | | + 0.40% PyDict_SetItem >>>>>>>> | | | | | | + 0.10% dict_set_item_by_hash_or_entry >>>>>>>> | | | | | | + 0.10% PyDict_SetItem@plt >>>>>>>> | | | | | + 1.20% PyString_FromString >>>>>>>> | | | | | + 0.60% PyObject_Malloc >>>>>>>> | | | | | + 0.20% __strlen_sse2_pminub >>>>>>>> | | | | | + 0.10% __memcpy_ssse3_back >>>>>>>> | | | | + 0.90% ctime_r >>>>>>>> | | | | + 0.80% PyFormatter::open_array_section(char const*) >>>>>>>> | | | | + 0.40% std::string::_Rep::_M_dispose(std::allocator<char> const&) [clone .isra.846] [clone .part.847] >>>>>>>> | | | | + 0.30% PyFormatter::dump_int(char const*, long) >>>>>>>> | | | | + 0.20% PyFormatter::close_section() >>>>>>>> | | | | + 0.10% tc_free >>>>>>>> | | | | + 0.10% std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string(char const*, std::allocator<char> const&) >>>>>>>> | | | | + 0.10% std::_Rb_tree_increment(std::_Rb_tree_node_base const*) >>>>>>>> | | | | + 0.10% pow2_hist_t::dump(ceph::Formatter*) const >>>>>>>> | | | | + 0.10% objectstore_perf_stat_t::dump(ceph::Formatter*) const >>>>>>>> | | | | + 0.10% PyFormatter::dump_string(char const*, std::basic_string_view<char, std::char_traits<char> >) >>>>>>>> | | | | + 0.10% PyFormatter::dump_pyobject(char const*, _object*) >>>>>>>> | | | + 21.80% Mutex::lock(bool) >>>>>>>> | | | | + 21.80% pthread_mutex_lock >>>>>>>> | | | | + 21.80% _L_lock_883 >>>>>>>> | | | | + 21.80% __lll_lock_wait >>>>>>>> | | | + 11.70% PGMap::dump(ceph::Formatter*) const >>>>>>>> | | | | + 11.70% PGMap::dump_pg_stats(ceph::Formatter*, bool) const >>>>>>>> | | | | + 10.90% pg_stat_t::dump(ceph::Formatter*) const >>>>>>>> | | | | | + 4.20% PyFormatter::dump_stream(char const*) >>>>>>>> | | | | | | + 2.80% std::basic_ios<char, std::char_traits<char> >::init(std::basic_streambuf<char, std::char_traits<char> >*) >>>>>>>> | | | | | | | + 2.10% std::basic_ios<char, std::char_traits<char> >::_M_cache_locale(std::locale const&) >>>>>>>> | | | | | | | | + 0.50% std::num_put<char, std::ostreambuf_iterator<char, std::char_traits<char> > > const& std::use_facet<std::num_put<char, std::ostreambuf_iterator<char, std::char_traits<char> > > >(std::locale const&) >>>>>>>> | | | | | | | | + 0.50% bool std::has_facet<std::num_put<char, std::ostreambuf_iterator<char, std::char_traits<char> > > >(std::locale const&) >>>>>>>> | | | | | | | | + 0.40% bool std::has_facet<std::num_get<char, std::istreambuf_iterator<char, std::char_traits<char> > > >(std::locale const&) >>>>>>>> | | | | | | | | + 0.20% std::num_get<char, std::istreambuf_iterator<char, std::char_traits<char> > > const& std::use_facet<std::num_get<char, std::istreambuf_iterator<char, std::char_traits<char> > > >(std::locale const&) >>>>>>>> | | | | | | | | + 0.20% std::ctype<char> const& std::use_facet<std::ctype<char> >(std::locale const&) >>>>>>>> | | | | | | | | + 0.20% bool std::has_facet<std::ctype<char> >(std::locale const&) >>>>>>>> | | | | | | | | + 0.10% _ZSt9has_facetISt7num_putIcSt19ostreambuf_iteratorIcSt11char_traitsIcEEEEbRKSt6locale@plt >>>>>>>> | | | | | | | + 0.70% std::ios_base::_M_init() >>>>>>>> | | | | | | + 0.50% tcmalloc::ThreadCache::FetchFromCentralCache(unsigned int, int) >>>>>>>> | | | | | | + 0.40% std::string::assign(char const*, unsigned long) >>>>>>>> | | | | | | + 0.20% std::locale::locale() >>>>>>>> | | | | | | + 0.10% std::ios_base::ios_base() >>>>>>>> | | | | | + 1.80% object_stat_collection_t::dump(ceph::Formatter*) const >>>>>>>> | | | | | | + 1.70% object_stat_sum_t::dump(ceph::Formatter*) const >>>>>>>> | | | | | | | + 1.40% PyFormatter::dump_pyobject(char const*, _object*) >>>>>>>> | | | | | | | | + 0.60% dictresize >>>>>>>> | | | | | | | | + 0.30% PyString_FromString >>>>>>>> | | | | | | | | + 0.20% PyDict_SetItem >>>>>>>> | | | | | | | | + 0.10% dict_set_item_by_hash_or_entry >>>>>>>> | | | | | | | + 0.20% PyFormatter::dump_int(char const*, long) >>>>>>>> | | | | | | + 0.10% PyFormatter::open_object_section(char const*) >>>>>>>> | | | | | + 1.80% PyFormatter::open_array_section(char const*) >>>>>>>> | | | | | | + 1.60% PyList_New >>>>>>>> | | | | | | | + 1.60% _PyObject_GC_New >>>>>>>> | | | | | | | + 1.60% _PyObject_GC_Malloc >>>>>>>> | | | | | | | + 1.60% collect >>>>>>>> | | | | | | | + 0.80% dict_traverse >>>>>>>> | | | | | | | + 0.10% subtype_traverse >>>>>>>> | | | | | | | + 0.10% list_traverse >>>>>>>> | | | | | | | + 0.10% func_traverse >>>>>>>> | | | | | | | + 0.10% _PyDict_MaybeUntrack >>>>>>>> | | | | | | + 0.20% PyFormatter::dump_pyobject(char const*, _object*) >>>>>>>> | | | | | + 1.70% utime_t::localtime(std::ostream&) const >>>>>>>> | | | | | | + 1.00% std::ostream& std::ostream::_M_insert<long>(long) >>>>>>>> | | | | | | | + 0.60% std::num_put<char, std::ostreambuf_iterator<char, std::char_traits<char> > >::do_put(std::ostreambuf_iterator<char, std::char_traits<char> >, std::ios_base&, char, long) const >>>>>>>> | | | | | | + 0.30% std::basic_ostream<char, std::char_traits<char> >& std::__ostream_insert<char, std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*, long) >>>>>>>> | | | | | | + 0.20% __tz_convert >>>>>>>> | | | | | + 0.90% PyFormatter::dump_pyobject(char const*, _object*) >>>>>>>> | | | | | + 0.20% pg_state_string(unsigned long) >>>>>>>> | | | | | + 0.20% operator<<(std::ostream&, eversion_t const&) [clone .isra.103] >>>>>>>> | | | | | + 0.10% std::ostream& std::ostream::_M_insert<unsigned long>(unsigned long) >>>>>>>> | | | | + 0.40% PyFormatter::dump_stream(char const*) >>>>>>>> | | | | + 0.30% operator<<(std::ostream&, pg_t const&) >>>>>>>> | | | | + 0.10% PyFormatter::open_object_section(char const*) >>>>>>>> | | | + 2.70% PyFormatter::finish_pending_streams() >>>>>>>> | | | | + 1.00% std::_List_base<std::shared_ptr<PyFormatter::PendingStream>, std::allocator<std::shared_ptr<PyFormatter::PendingStream> > >::_M_clear() >>>>>>>> | | | | | + 0.40% std::_Sp_counted_ptr_inplace<PyFormatter::PendingStream, std::allocator<PyFormatter::PendingStream>, (__gnu_cxx::_Lock_policy)2>::_M_dispose() >>>>>>>> | | | | | + 0.20% tcmalloc::ThreadCache::ListTooLong(tcmalloc::ThreadCache::FreeList*, unsigned int) >>>>>>>> | | | | + 0.70% PyFormatter::dump_pyobject(char const*, _object*) >>>>>>>> | | | | + 0.50% std::string::replace(unsigned long, unsigned long, char const*, unsigned long) >>>>>>>> | | | | + 0.30% PyString_FromString >>>>>>>> | | | + 1.10% PyEval_RestoreThread >>>>>>>> | | | + 1.10% PyThread_acquire_lock >>>>>>>> | | | + 1.10% sem_wait@@GLIBC_2.2.5 >>>>>>>> | | | + 1.10% __new_sem_wait_slow.constprop.0 >>>>>>>> | | | + 1.10% do_futex_wait.constprop.1 >>>>>>>> | | + 2.90% frame_dealloc >>>>>>>> | | + 2.90% dict_dealloc >>>>>>>> | | + 2.90% list_dealloc >>>>>>>> | | + 2.90% dict_dealloc >>>>>>>> | | + 1.90% list_dealloc >>>>>>>> | | | + 1.90% dict_dealloc >>>>>>>> | | | + 1.70% list_dealloc >>>>>>>> | | | + 1.50% dict_dealloc >>>>>>>> | | | | + 0.90% dict_dealloc >>>>>>>> | | | | + 0.10% PyObject_Free >>>>>>>> | | | + 0.10% tcmalloc::ThreadCache::ListTooLong(tcmalloc::ThreadCache::FreeList*, unsigned int) >>>>>>>> | | + 0.30% PyObject_Free >>>>>>>> | | + 0.20% dict_dealloc >>>>>>>> | + 8.10% Gil::Gil(SafeThreadState&, bool) >>>>>>>> | + 8.10% PyEval_RestoreThread >>>>>>>> | + 8.10% PyThread_acquire_lock >>>>>>>> | + 8.10% sem_wait@@GLIBC_2.2.5 >>>>>>>> | + 8.10% __new_sem_wait_slow.constprop.0 >>>>>>>> | + 8.10% do_futex_wait.constprop.1 >>>>>>>> + 0.60% std::condition_variable::wait(std::unique_lock<std::mutex>&) >>>>>>>> >>>>>>>> -- >>>>>>>> Paul Mezzanini >>>>>>>> Sr Systems Administrator / Engineer, Research Computing >>>>>>>> Information & Technology Services >>>>>>>> Finance & Administration >>>>>>>> Rochester Institute of Technology >>>>>>>> o:(585) 475-3245 | pfmeec@xxxxxxx >>>>>>>> >>>>>>>> CONFIDENTIALITY NOTE: The information transmitted, including attachments, is >>>>>>>> intended only for the person(s) or entity to which it is addressed and may >>>>>>>> contain confidential and/or privileged material. Any review, retransmission, >>>>>>>> dissemination or other use of, or taking of any action in reliance upon this >>>>>>>> information by persons or entities other than the intended recipient is >>>>>>>> prohibited. If you received this in error, please contact the sender and >>>>>>>> destroy any copies of this information. >>>>>>>> ------------------------ >>>>>>>> >>>>>>>> ________________________________________ >>>>>>>> From: Mark Nelson <mnelson@xxxxxxxxxx> >>>>>>>> Sent: Thursday, December 19, 2019 11:47 AM >>>>>>>> To: ceph-users@xxxxxxx >>>>>>>> Subject: Re: High CPU usage by ceph-mgr in 14.2.5 >>>>>>>> >>>>>>>> If you can get a wallclock profiler on the mgr process we might be able >>>>>>>> to figure out specifics of what's taking so much time (ie processing >>>>>>>> pg_summary or something else). Assuming you have gdb with the python >>>>>>>> bindings and the ceph debug packages installed, if you (are anyone) >>>>>>>> could try gdbpmp on the 100% mgr process that would be fantastic. >>>>>>>> >>>>>>>> >>>>>>>> https://github.com/markhpc/gdbpmp >>>>>>>> >>>>>>>> >>>>>>>> gdbpmp.py -p`pidof ceph-mgr` -n 1000 -o mgr.gdbpmp >>>>>>>> >>>>>>>> >>>>>>>> If you want to view the results: >>>>>>>> >>>>>>>> >>>>>>>> gdbpmp.py -i mgr.gdbpmp -t 1 >>>>>>>> >>>>>>>> >>>>>>>> Thanks, >>>>>>>> >>>>>>>> Mark >>>>>>>> >>>>>>>> >>>>>>>> On 12/19/19 6:29 AM, Paul Emmerich wrote: >>>>>>>>> We're also seeing unusually high mgr CPU usage on some setups, the >>>>>>>>> only thing they have in common seem to > 300 OSDs. >>>>>>>>> >>>>>>>>> Threads using the CPU are "mgr-fin" and and "ms_dispatch" >>>>>>>>> >>>>>>>>> >>>>>>>>> Paul >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Paul Emmerich >>>>>>>>> >>>>>>>>> Looking for help with your Ceph cluster? Contact us at https://croit.io >>>>>>>>> >>>>>>>>> croit GmbH >>>>>>>>> Freseniusstr. 31h >>>>>>>>> 81247 München >>>>>>>>> www.croit.io <http://www.croit.io> >>>>>>>>> Tel: +49 89 1896585 90 >>>>>>>>> >>>>>>>>> >>>>>>>>> On Thu, Dec 19, 2019 at 9:40 AM Serkan Çoban <cobanserkan@xxxxxxxxx >>>>>>>>> <mailto:cobanserkan@xxxxxxxxx>> wrote: >>>>>>>>> >>>>>>>>> +1 >>>>>>>>> 1500 OSDs, mgr is constant %100 after upgrading from 14.2.2 to 14.2.5. >>>>>>>>> >>>>>>>>> On Thu, Dec 19, 2019 at 11:06 AM Toby Darling >>>>>>>>> <toby@xxxxxxxxxxxxxxxxx <mailto:toby@xxxxxxxxxxxxxxxxx>> wrote: >>>>>>>>> > >>>>>>>>> > On 18/12/2019 22:40, Bryan Stillwell wrote: >>>>>>>>> > > That's how we noticed it too. Our graphs went silent after >>>>>>>>> the upgrade >>>>>>>>> > > completed. Is your large cluster over 350 OSDs? >>>>>>>>> > >>>>>>>>> > A 'me too' on this - graphs have gone quiet, and mgr is using >>>>>>>>> 100% CPU. >>>>>>>>> > This happened when we grew our 14.2.5 cluster from 328 to 436 OSDs. >>>>>>>>> > >>>>>>>>> > Cheers >>>>>>>>> > Toby >>>>>>>>> > -- >>>>>>>>> > Toby Darling, Scientific Computing (2N249) >>>>>>>>> > MRC Laboratory of Molecular Biology >>>>>>>>> > Francis Crick Avenue >>>>>>>>> > Cambridge Biomedical Campus >>>>>>>>> > Cambridge CB2 0QH >>>>>>>>> > Phone 01223 267070 >>>>>>>>> > _______________________________________________ >>>>>>>>> > ceph-users mailing list -- ceph-users@xxxxxxx >>>>>>>>> <mailto:ceph-users@xxxxxxx> >>>>>>>>> > To unsubscribe send an email to ceph-users-leave@xxxxxxx >>>>>>>>> <mailto:ceph-users-leave@xxxxxxx> >>>>>>>>> _______________________________________________ >>>>>>>>> ceph-users mailing list -- ceph-users@xxxxxxx >>>>>>>>> <mailto:ceph-users@xxxxxxx> >>>>>>>>> To unsubscribe send an email to ceph-users-leave@xxxxxxx >>>>>>>>> <mailto:ceph-users-leave@xxxxxxx> >>>>>>>>> >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> ceph-users mailing list -- ceph-users@xxxxxxx >>>>>>>>> To unsubscribe send an email to ceph-users-leave@xxxxxxx >>>>>>>> _______________________________________________ >>>>>>>> ceph-users mailing list -- ceph-users@xxxxxxx >>>>>>>> To unsubscribe send an email to ceph-users-leave@xxxxxxx >>>>>>> _______________________________________________ >>>>>>> ceph-users mailing list -- ceph-users@xxxxxxx >>>>>>> To unsubscribe send an email to ceph-users-leave@xxxxxxx >>>>>> _______________________________________________ >>>>>> ceph-users mailing list -- ceph-users@xxxxxxx >>>>>> To unsubscribe send an email to ceph-users-leave@xxxxxxx >>>> _______________________________________________ >>>> ceph-users mailing list -- ceph-users@xxxxxxx >>>> To unsubscribe send an email to ceph-users-leave@xxxxxxx >>>> >>> _______________________________________________ >>> ceph-users mailing list -- ceph-users@xxxxxxx >>> To unsubscribe send an email to ceph-users-leave@xxxxxxx >> > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx