Hello,
Thanks for all your help.
The dd is an option of any command?, because at least on Debian/Ubuntu is an aplication to copy blocks, and then fails.
For now I cannot change the configuration, but later I'll try.
About the logs, I've not seen nothing about "warning", "error", "failed", "message" or something similar, so looks like there are no messages of that kind.
Greetings!!
2018-07-25 14:48 GMT+02:00 Yan, Zheng <ukernel@xxxxxxxxx>:
On Wed, Jul 25, 2018 at 8:12 PM Yan, Zheng <ukernel@xxxxxxxxx> wrote:
>
> On Wed, Jul 25, 2018 at 5:04 PM Daniel Carrasco <d.carrasco@xxxxxxxxx> wrote:
> >
> > Hello,
> >
> > I've attached the PDF.
> >
> > I don't know if is important, but I made changes on configuration and I've restarted the servers after dump that heap file. I've changed the memory_limit to 25Mb to test if stil with aceptable values of RAM.
> >
>
> Looks like there are memory leak in async messenger. what's output of
> "dd /usr/bin/ceph-mds"? Could you try simple messenger (add "ms type =
> simple" to 'global' section of ceph.conf)
>
Besides, are there any suspicious messages in mds log? such as "failed
to decode message of type"
> Regards
> Yan, Zheng
>
> > Greetings!
> >
> > 2018-07-25 2:53 GMT+02:00 Yan, Zheng <ukernel@xxxxxxxxx>:
> >>
> >> On Wed, Jul 25, 2018 at 4:52 AM Daniel Carrasco <d.carrasco@xxxxxxxxx> wrote:
> >> >
> >> > Hello,
> >> >
> >> > I've run the profiler for about 5-6 minutes and this is what I've got:
> >> >
> >>
> >> please run pprof --pdf /usr/bin/ceph-mds
> >> /var/log/ceph/ceph-mds.x.profile.<largest number>.heap >
> >> /tmp/profile.pdf. and send me the pdf
> >>
> >>
> >>
> >> > ------------------------------------------------------------ ------------------------------ --
> >> > ------------------------------------------------------------ ------------------------------ --
> >> > ------------------------------------------------------------ ------------------------------ --
> >> > Using local file /usr/bin/ceph-mds.
> >> > Using local file /var/log/ceph/mds.kavehome-mgto-pro-fs01.profile.0009. heap.
> >> > Total: 400.0 MB
> >> > 362.5 90.6% 90.6% 362.5 90.6% ceph::buffer::create_aligned_in_mempool
> >> > 20.4 5.1% 95.7% 29.8 7.5% CDir::_load_dentry
> >> > 5.9 1.5% 97.2% 6.9 1.7% CDir::add_primary_dentry
> >> > 4.7 1.2% 98.4% 4.7 1.2% ceph::logging::Log::create_entry
> >> > 1.8 0.5% 98.8% 1.8 0.5% std::_Rb_tree::_M_emplace_hint_unique
> >> > 1.8 0.5% 99.3% 2.2 0.5% compact_map_base::decode
> >> > 0.6 0.1% 99.4% 0.7 0.2% CInode::add_client_cap
> >> > 0.5 0.1% 99.5% 0.5 0.1% std::__cxx11::basic_string::_M_mutate
> >> > 0.4 0.1% 99.6% 0.4 0.1% SimpleLock::more
> >> > 0.4 0.1% 99.7% 0.4 0.1% MDCache::add_inode
> >> > 0.3 0.1% 99.8% 0.3 0.1% CDir::add_to_bloom
> >> > 0.2 0.1% 99.9% 0.2 0.1% CDir::steal_dentry
> >> > 0.2 0.0% 99.9% 0.2 0.0% CInode::get_or_open_dirfrag
> >> > 0.1 0.0% 99.9% 0.8 0.2% std::enable_if::type decode
> >> > 0.1 0.0% 100.0% 0.1 0.0% ceph::buffer::list::crc32c
> >> > 0.1 0.0% 100.0% 0.1 0.0% decode_message
> >> > 0.0 0.0% 100.0% 0.0 0.0% OpTracker::create_request
> >> > 0.0 0.0% 100.0% 0.0 0.0% TrackedOp::TrackedOp
> >> > 0.0 0.0% 100.0% 0.0 0.0% std::vector::_M_emplace_back_aux
> >> > 0.0 0.0% 100.0% 0.0 0.0% std::_Rb_tree::_M_insert_unique
> >> > 0.0 0.0% 100.0% 0.0 0.0% CInode::add_dirfrag
> >> > 0.0 0.0% 100.0% 0.0 0.0% MDLog::_prepare_new_segment
> >> > 0.0 0.0% 100.0% 0.0 0.0% DispatchQueue::enqueue
> >> > 0.0 0.0% 100.0% 0.0 0.0% ceph::buffer::list::push_back
> >> > 0.0 0.0% 100.0% 0.0 0.0% Server::prepare_new_inode
> >> > 0.0 0.0% 100.0% 365.6 91.4% EventCenter::process_events
> >> > 0.0 0.0% 100.0% 0.0 0.0% std::_Rb_tree::_M_copy
> >> > 0.0 0.0% 100.0% 0.0 0.0% CDir::add_null_dentry
> >> > 0.0 0.0% 100.0% 0.0 0.0% Locker::check_inode_max_size
> >> > 0.0 0.0% 100.0% 0.0 0.0% CDentry::add_client_lease
> >> > 0.0 0.0% 100.0% 0.0 0.0% CInode::project_inode
> >> > 0.0 0.0% 100.0% 0.0 0.0% std::__cxx11::list::_M_insert
> >> > 0.0 0.0% 100.0% 0.0 0.0% MDBalancer::handle_heartbeat
> >> > 0.0 0.0% 100.0% 0.0 0.0% MDBalancer::send_heartbeat
> >> > 0.0 0.0% 100.0% 0.0 0.0% C_GatherBase::C_GatherSub::complete
> >> > 0.0 0.0% 100.0% 0.0 0.0% EventCenter::create_time_event
> >> > 0.0 0.0% 100.0% 0.0 0.0% CDir::_omap_fetch
> >> > 0.0 0.0% 100.0% 0.0 0.0% Locker::handle_inode_file_caps
> >> > 0.0 0.0% 100.0% 0.0 0.0% std::_Rb_tree::_M_insert_equal
> >> > 0.0 0.0% 100.0% 0.0 0.0% Locker::issue_caps
> >> > 0.0 0.0% 100.0% 0.1 0.0% MDLog::_submit_thread
> >> > 0.0 0.0% 100.0% 0.0 0.0% Journaler::_wait_for_flush
> >> > 0.0 0.0% 100.0% 0.0 0.0% Journaler::wrap_finisher
> >> > 0.0 0.0% 100.0% 0.0 0.0% MDSCacheObject::add_waiter
> >> > 0.0 0.0% 100.0% 0.0 0.0% std::__cxx11::list::insert
> >> > 0.0 0.0% 100.0% 0.0 0.0% std::__detail::_Map_base::operator[]
> >> > 0.0 0.0% 100.0% 0.0 0.0% Locker::mark_updated_scatterlock
> >> > 0.0 0.0% 100.0% 0.0 0.0% std::_Rb_tree::_M_insert_
> >> > 0.0 0.0% 100.0% 0.0 0.0% alloc_ptr::operator->
> >> > 0.0 0.0% 100.0% 0.0 0.0% ceph::buffer::list::append@5c1560
> >> > 0.0 0.0% 100.0% 0.0 0.0% ceph::buffer::malformed_input::~malformed_input
> >> > 0.0 0.0% 100.0% 0.0 0.0% compact_set_base::insert
> >> > 0.0 0.0% 100.0% 0.0 0.0% CDir::add_waiter
> >> > 0.0 0.0% 100.0% 0.0 0.0% InoTable::apply_release_ids
> >> > 0.0 0.0% 100.0% 0.0 0.0% InoTable::project_release_ids
> >> > 0.0 0.0% 100.0% 2.2 0.5% InodeStoreBase::decode_bare
> >> > 0.0 0.0% 100.0% 0.0 0.0% interval_set::erase
> >> > 0.0 0.0% 100.0% 1.1 0.3% std::map::operator[]
> >> > 0.0 0.0% 100.0% 0.0 0.0% Beacon::_send
> >> > 0.0 0.0% 100.0% 0.0 0.0% MDSDaemon::reset_tick
> >> > 0.0 0.0% 100.0% 0.0 0.0% MgrClient::send_report
> >> > 0.0 0.0% 100.0% 0.0 0.0% Journaler::_do_flush
> >> > 0.0 0.0% 100.0% 0.1 0.0% Locker::rdlock_start
> >> > 0.0 0.0% 100.0% 0.0 0.0% MDCache::_get_waiter
> >> > 0.0 0.0% 100.0% 0.0 0.0% CDentry::~CDentry
> >> > 0.0 0.0% 100.0% 0.0 0.0% MonClient::schedule_tick
> >> > 0.0 0.0% 100.0% 0.1 0.0% AsyncConnection::handle_write
> >> > 0.0 0.0% 100.0% 0.1 0.0% AsyncConnection::prepare_send_message
> >> > 0.0 0.0% 100.0% 365.5 91.4% AsyncConnection::process
> >> > 0.0 0.0% 100.0% 0.3 0.1% AsyncConnection::send_message
> >> > 0.0 0.0% 100.0% 0.0 0.0% AsyncConnection::tick
> >> > 0.0 0.0% 100.0% 0.0 0.0% AsyncMessenger::_send_message
> >> > 0.0 0.0% 100.0% 0.0 0.0% AsyncMessenger::send_message
> >> > 0.0 0.0% 100.0% 0.0 0.0% AsyncMessenger::submit_message
> >> > 0.0 0.0% 100.0% 0.0 0.0% Beacon::notify_health
> >> > 0.0 0.0% 100.0% 0.2 0.1% CDentry::CDentry
> >> > 0.0 0.0% 100.0% 0.0 0.0% CDentry::_mark_dirty
> >> > 0.0 0.0% 100.0% 0.0 0.0% CDentry::auth_pin
> >> > 0.0 0.0% 100.0% 0.0 0.0% CDentry::mark_dirty
> >> > 0.0 0.0% 100.0% 0.0 0.0% CDentry::pop_projected_linkage
> >> > 0.0 0.0% 100.0% 0.0 0.0% CDir::_mark_dirty
> >> > 0.0 0.0% 100.0% 29.8 7.5% CDir::_omap_fetched
> >> > 0.0 0.0% 100.0% 0.0 0.0% CDir::auth_pin
> >> > 0.0 0.0% 100.0% 0.0 0.0% CDir::fetch
> >> > 0.0 0.0% 100.0% 0.0 0.0% CDir::link_inode_work
> >> > 0.0 0.0% 100.0% 0.0 0.0% CDir::link_primary_inode
> >> > 0.0 0.0% 100.0% 0.0 0.0% CInode::_mark_dirty
> >> > 0.0 0.0% 100.0% 0.0 0.0% CInode::add_waiter
> >> > 0.0 0.0% 100.0% 0.0 0.0% CInode::auth_pin
> >> > 0.0 0.0% 100.0% 0.4 0.1% CInode::choose_ideal_loner
> >> > 0.0 0.0% 100.0% 0.4 0.1% CInode::encode_inodestat
> >> > 0.0 0.0% 100.0% 0.0 0.0% CInode::mark_dirty
> >> > 0.0 0.0% 100.0% 0.0 0.0% CInode::mark_dirty_parent
> >> > 0.0 0.0% 100.0% 0.0 0.0% CInode::mark_dirty_rstat
> >> > 0.0 0.0% 100.0% 0.0 0.0% CInode::pop_and_dirty_projected_inode
> >> > 0.0 0.0% 100.0% 0.4 0.1% CInode::set_loner_cap
> >> > 0.0 0.0% 100.0% 29.8 7.5% C_IO_Dir_OMAP_Fetched::finish
> >> > 0.0 0.0% 100.0% 0.0 0.0% C_Locker_FileUpdate_finish::finish
> >> > 0.0 0.0% 100.0% 0.0 0.0% C_MDL_CheckMaxSize::finish
> >> > 0.0 0.0% 100.0% 0.0 0.0% C_MDS_RetryRequest::~C_MDS_RetryRequest
> >> > 0.0 0.0% 100.0% 0.0 0.0% C_MDS_inode_update_finish::finish
> >> > 0.0 0.0% 100.0% 0.0 0.0% C_MDS_openc_finish::finish
> >> > 0.0 0.0% 100.0% 0.0 0.0% C_MDS_session_finish::finish
> >> > 0.0 0.0% 100.0% 0.0 0.0% C_OnFinisher::finish
> >> > 0.0 0.0% 100.0% 1.2 0.3% Context::complete
> >> > 0.0 0.0% 100.0% 3.7 0.9% DispatchQueue::DispatchThread::entry
> >> > 0.0 0.0% 100.0% 3.7 0.9% DispatchQueue::entry
> >> > 0.0 0.0% 100.0% 0.9 0.2% DispatchQueue::fast_dispatch
> >> > 0.0 0.0% 100.0% 2.1 0.5% DispatchQueue::pre_dispatch
> >> > 0.0 0.0% 100.0% 0.0 0.0% EMetaBlob::print
> >> > 0.0 0.0% 100.0% 0.0 0.0% EventCenter::process_time_events
> >> > 0.0 0.0% 100.0% 29.9 7.5% Finisher::finisher_thread_entry
> >> > 0.0 0.0% 100.0% 0.4 0.1% FunctionContext::finish
> >> > 0.0 0.0% 100.0% 0.0 0.0% Journaler::_flush
> >> > 0.0 0.0% 100.0% 0.0 0.0% Journaler::_write_head
> >> > 0.0 0.0% 100.0% 0.0 0.0% Journaler::flush
> >> > 0.0 0.0% 100.0% 0.0 0.0% Journaler::wait_for_flush
> >> > 0.0 0.0% 100.0% 0.0 0.0% Locker::_do_cap_release
> >> > 0.0 0.0% 100.0% 0.0 0.0% Locker::_do_cap_update
> >> > 0.0 0.0% 100.0% 0.0 0.0% Locker::_drop_non_rdlocks
> >> > 0.0 0.0% 100.0% 0.0 0.0% Locker::_rdlock_kick
> >> > 0.0 0.0% 100.0% 0.2 0.0% Locker::acquire_locks
> >> > 0.0 0.0% 100.0% 0.0 0.0% Locker::adjust_cap_wanted
> >> > 0.0 0.0% 100.0% 0.2 0.0% Locker::dispatch
> >> > 0.0 0.0% 100.0% 0.0 0.0% Locker::drop_locks
> >> > 0.0 0.0% 100.0% 0.4 0.1% Locker::eval
> >> > 0.0 0.0% 100.0% 0.0 0.0% Locker::eval_gather
> >> > 0.0 0.0% 100.0% 0.0 0.0% Locker::file_update_finish
> >> > 0.0 0.0% 100.0% 0.0 0.0% Locker::handle_client_cap_release
> >> > 0.0 0.0% 100.0% 0.2 0.0% Locker::handle_client_caps
> >> > 0.0 0.0% 100.0% 0.0 0.0% Locker::handle_client_lease
> >> > 0.0 0.0% 100.0% 0.0 0.0% Locker::handle_file_lock
> >> > 0.0 0.0% 100.0% 0.0 0.0% Locker::handle_lock
> >> > 0.0 0.0% 100.0% 0.0 0.0% Locker::issue_caps_set
> >> > 0.0 0.0% 100.0% 0.0 0.0% Locker::issue_client_lease
> >> > 0.0 0.0% 100.0% 0.6 0.2% Locker::issue_new_caps
> >> > 0.0 0.0% 100.0% 0.0 0.0% Locker::local_wrlock_start
> >> > 0.0 0.0% 100.0% 0.0 0.0% Locker::nudge_log
> >> > 0.0 0.0% 100.0% 0.0 0.0% Locker::scatter_eval
> >> > 0.0 0.0% 100.0% 0.0 0.0% Locker::scatter_mix
> >> > 0.0 0.0% 100.0% 0.0 0.0% Locker::scatter_nudge
> >> > 0.0 0.0% 100.0% 0.0 0.0% Locker::scatter_tick
> >> > 0.0 0.0% 100.0% 0.0 0.0% Locker::scatter_writebehind
> >> > 0.0 0.0% 100.0% 0.0 0.0% Locker::scatter_writebehind_finish
> >> > 0.0 0.0% 100.0% 0.0 0.0% Locker::send_lock_message@42d5b0
> >> > 0.0 0.0% 100.0% 0.0 0.0% Locker::send_lock_message@42f2b0
> >> > 0.0 0.0% 100.0% 0.0 0.0% Locker::share_inode_max_size
> >> > 0.0 0.0% 100.0% 0.0 0.0% Locker::simple_lock
> >> > 0.0 0.0% 100.0% 0.0 0.0% Locker::simple_sync
> >> > 0.0 0.0% 100.0% 0.0 0.0% Locker::tick
> >> > 0.0 0.0% 100.0% 0.0 0.0% Locker::try_eval@43da60
> >> > 0.0 0.0% 100.0% 0.0 0.0% Locker::try_eval@441fb0
> >> > 0.0 0.0% 100.0% 0.0 0.0% Locker::wrlock_finish
> >> > 0.0 0.0% 100.0% 0.0 0.0% Locker::wrlock_force
> >> > 0.0 0.0% 100.0% 0.0 0.0% Locker::wrlock_start
> >> > 0.0 0.0% 100.0% 0.0 0.0% Locker::xlock_start
> >> > 0.0 0.0% 100.0% 0.0 0.0% MClientCaps::print
> >> > 0.0 0.0% 100.0% 0.0 0.0% MClientRequest::decode_payload
> >> > 0.0 0.0% 100.0% 0.0 0.0% MClientRequest::print
> >> > 0.0 0.0% 100.0% 0.0 0.0% MDBalancer::prep_rebalance
> >> > 0.0 0.0% 100.0% 0.0 0.0% MDBalancer::proc_message
> >> > 0.0 0.0% 100.0% 0.0 0.0% MDCache::check_memory_usage
> >> > 0.0 0.0% 100.0% 0.2 0.1% MDCache::path_traverse
> >> > 0.0 0.0% 100.0% 0.0 0.0% MDCache::predirty_journal_parents
> >> > 0.0 0.0% 100.0% 0.0 0.0% MDCache::request_cleanup
> >> > 0.0 0.0% 100.0% 0.0 0.0% MDCache::request_finish
> >> > 0.0 0.0% 100.0% 0.0 0.0% MDCache::request_start
> >> > 0.0 0.0% 100.0% 0.3 0.1% MDCache::trim
> >> > 0.0 0.0% 100.0% 0.3 0.1% MDCache::trim_dentry
> >> > 0.0 0.0% 100.0% 0.3 0.1% MDCache::trim_lru
> >> > 0.0 0.0% 100.0% 0.0 0.0% MDCache::truncate_inode
> >> > 0.0 0.0% 100.0% 0.1 0.0% MDLog::SubmitThread::entry
> >> > 0.0 0.0% 100.0% 0.0 0.0% MDLog::_start_new_segment
> >> > 0.0 0.0% 100.0% 0.0 0.0% MDLog::_submit_entry
> >> > 0.0 0.0% 100.0% 0.0 0.0% MDLog::submit_entry
> >> > 0.0 0.0% 100.0% 0.0 0.0% MDSCacheObject::finish_waiting
> >> > 0.0 0.0% 100.0% 0.2 0.1% MDSCacheObject::get
> >> > 0.0 0.0% 100.0% 1.7 0.4% MDSDaemon::ms_dispatch
> >> > 0.0 0.0% 100.0% 0.0 0.0% MDSDaemon::tick
> >> > 0.0 0.0% 100.0% 29.9 7.5% MDSIOContextBase::complete
> >> > 0.0 0.0% 100.0% 0.7 0.2% MDSInternalContextBase::complete
> >> > 0.0 0.0% 100.0% 0.0 0.0% MDSLogContextBase::complete
> >> > 0.0 0.0% 100.0% 0.3 0.1% MDSRank::ProgressThread::entry
> >> > 0.0 0.0% 100.0% 0.7 0.2% MDSRank::_advance_queues
> >> > 0.0 0.0% 100.0% 1.7 0.4% MDSRank::_dispatch
> >> > 0.0 0.0% 100.0% 1.3 0.3% MDSRank::handle_deferrable_message
> >> > 0.0 0.0% 100.0% 0.0 0.0% MDSRank::send_message_client
> >> > 0.0 0.0% 100.0% 0.0 0.0% MDSRank::send_message_client_counted@2a9260
> >> > 0.0 0.0% 100.0% 0.0 0.0% MDSRank::send_message_client_counted@2a94f0
> >> > 0.0 0.0% 100.0% 0.0 0.0% MDSRank::send_message_client_counted@2b1920
> >> > 0.0 0.0% 100.0% 0.0 0.0% MDSRank::send_message_mds
> >> > 0.0 0.0% 100.0% 1.7 0.4% MDSRankDispatcher::ms_dispatch
> >> > 0.0 0.0% 100.0% 0.4 0.1% MDSRankDispatcher::tick
> >> > 0.0 0.0% 100.0% 0.0 0.0% MOSDOp::print
> >> > 0.0 0.0% 100.0% 0.1 0.0% Message::encode
> >> > 0.0 0.0% 100.0% 0.0 0.0% MonClient::_check_auth_rotating
> >> > 0.0 0.0% 100.0% 0.0 0.0% MonClient::_check_auth_tickets
> >> > 0.0 0.0% 100.0% 0.0 0.0% MonClient::_send_mon_message
> >> > 0.0 0.0% 100.0% 0.0 0.0% MonClient::tick
> >> > 0.0 0.0% 100.0% 0.0 0.0% MutationImpl::MutationImpl
> >> > 0.0 0.0% 100.0% 0.1 0.0% MutationImpl::auth_pin
> >> > 0.0 0.0% 100.0% 0.1 0.0% MutationImpl::pin
> >> > 0.0 0.0% 100.0% 0.0 0.0% MutationImpl::start_locking
> >> > 0.0 0.0% 100.0% 365.6 91.4% NetworkStack::get_worker
> >> > 0.0 0.0% 100.0% 0.8 0.2% ObjectOperation::C_ObjectOperation_decodevals:: finish
> >> > 0.0 0.0% 100.0% 0.1 0.0% Objecter::_op_submit
> >> > 0.0 0.0% 100.0% 0.1 0.0% Objecter::_op_submit_with_budget
> >> > 0.0 0.0% 100.0% 0.1 0.0% Objecter::_send_op
> >> > 0.0 0.0% 100.0% 0.8 0.2% Objecter::handle_osd_op_reply
> >> > 0.0 0.0% 100.0% 0.8 0.2% Objecter::ms_dispatch
> >> > 0.0 0.0% 100.0% 0.8 0.2% Objecter::ms_fast_dispatch
> >> > 0.0 0.0% 100.0% 0.1 0.0% Objecter::op_submit
> >> > 0.0 0.0% 100.0% 0.0 0.0% Objecter::sg_write_trunc
> >> > 0.0 0.0% 100.0% 0.0 0.0% OpHistory::insert
> >> > 0.0 0.0% 100.0% 0.0 0.0% OpTracker::unregister_inflight_op
> >> > 0.0 0.0% 100.0% 0.0 0.0% PrebufferedStreambuf::overflow
> >> > 0.0 0.0% 100.0% 0.0 0.0% SafeTimer::add_event_after
> >> > 0.0 0.0% 100.0% 0.0 0.0% SafeTimer::add_event_at
> >> > 0.0 0.0% 100.0% 0.4 0.1% SafeTimer::timer_thread
> >> > 0.0 0.0% 100.0% 0.4 0.1% SafeTimerThread::entry
> >> > 0.0 0.0% 100.0% 0.0 0.0% Server::_session_logged
> >> > 0.0 0.0% 100.0% 0.0 0.0% Server::apply_allocated_inos
> >> > 0.0 0.0% 100.0% 1.1 0.3% Server::dispatch
> >> > 0.0 0.0% 100.0% 1.7 0.4% Server::dispatch_client_request
> >> > 0.0 0.0% 100.0% 0.7 0.2% Server::handle_client_getattr
> >> > 0.0 0.0% 100.0% 0.9 0.2% Server::handle_client_open
> >> > 0.0 0.0% 100.0% 0.0 0.0% Server::handle_client_openc
> >> > 0.0 0.0% 100.0% 0.0 0.0% Server::handle_client_readdir
> >> > 0.0 0.0% 100.0% 1.1 0.3% Server::handle_client_request
> >> > 0.0 0.0% 100.0% 0.0 0.0% Server::handle_client_session
> >> > 0.0 0.0% 100.0% 0.0 0.0% Server::handle_client_setattr
> >> > 0.0 0.0% 100.0% 0.0 0.0% Server::journal_and_reply
> >> > 0.0 0.0% 100.0% 0.0 0.0% Server::journal_close_session
> >> > 0.0 0.0% 100.0% 0.3 0.1% Server::rdlock_path_pin_ref
> >> > 0.0 0.0% 100.0% 0.0 0.0% Server::recall_client_state
> >> > 0.0 0.0% 100.0% 0.5 0.1% Server::reply_client_request
> >> > 0.0 0.0% 100.0% 0.5 0.1% Server::respond_to_request
> >> > 0.0 0.0% 100.0% 0.4 0.1% Server::set_trace_dist
> >> > 0.0 0.0% 100.0% 0.0 0.0% SessionMap::_mark_dirty
> >> > 0.0 0.0% 100.0% 0.0 0.0% SessionMap::mark_dirty
> >> > 0.0 0.0% 100.0% 0.0 0.0% SessionMap::remove_session
> >> > 0.0 0.0% 100.0% 0.0 0.0% ceph::buffer::list::iterator_impl::copy
> >> > 0.0 0.0% 100.0% 0.0 0.0% ceph::buffer::list::iterator_impl::copy_shallow
> >> > 0.0 0.0% 100.0% 400.0 100.0% clone
> >> > 0.0 0.0% 100.0% 0.0 0.0% filepath::parse_bits
> >> > 0.0 0.0% 100.0% 0.0 0.0% inode_t::operator=
> >> > 0.0 0.0% 100.0% 0.0 0.0% operator<<@2a2890
> >> > 0.0 0.0% 100.0% 0.0 0.0% operator<<@2c9760
> >> > 0.0 0.0% 100.0% 0.0 0.0% operator<<@3eadf0
> >> > 0.0 0.0% 100.0% 400.0 100.0% start_thread
> >> > 0.0 0.0% 100.0% 0.1 0.0% std::__cxx11::basic_string::_M_append
> >> > 0.0 0.0% 100.0% 0.0 0.0% std::__cxx11::basic_string::_M_replace_aux
> >> > 0.0 0.0% 100.0% 0.0 0.0% std::__cxx11::list::operator=
> >> > 0.0 0.0% 100.0% 0.0 0.0% std::__ostream_insert
> >> > 0.0 0.0% 100.0% 0.0 0.0% std::basic_streambuf::xsputn
> >> > 0.0 0.0% 100.0% 0.0 0.0% std::num_put::_M_insert_int
> >> > 0.0 0.0% 100.0% 0.0 0.0% std::num_put::do_put
> >> > 0.0 0.0% 100.0% 0.0 0.0% std::operator<<
> >> > 0.0 0.0% 100.0% 0.0 0.0% std::ostream::_M_insert
> >> > 0.0 0.0% 100.0% 365.6 91.4% std::this_thread::__sleep_for
> >> > 0.0 0.0% 100.0% 0.0 0.0% utime_t::localtime
> >> > 0.0 0.0% 100.0% 0.1 0.0% void finish_contexts@2a30f0
> >> > ------------------------------------------------------------ ------------------------------ --
> >> > ------------------------------------------------------------ ------------------------------ --
> >> > ------------------------------------------------------------ ------------------------------ --
> >> >
> >> >
> >> > Greetings!!
> >> >
> >> > 2018-07-24 12:07 GMT+02:00 Yan, Zheng <ukernel@xxxxxxxxx>:
> >> >>
> >> >> On Tue, Jul 24, 2018 at 4:59 PM Daniel Carrasco <d.carrasco@xxxxxxxxx> wrote:
> >> >> >
> >> >> > Hello,
> >> >> >
> >> >> > How many time is neccesary?, because is a production environment and memory profiler + low cache size because the problem, gives a lot of CPU usage from OSD and MDS that makes it fails while profiler is running. Is there any problem if is done in a low traffic time? (less usage and maybe it don't fails, but maybe less info about usage).
> >> >> >
> >> >>
> >> >> just one time, wait a few minutes between start_profiler and stop_profiler
> >> >>
> >> >> > Greetings!
> >> >> >
> >> >> > 2018-07-24 10:21 GMT+02:00 Yan, Zheng <ukernel@xxxxxxxxx>:
> >> >> >>
> >> >> >> I mean:
> >> >> >>
> >> >> >> ceph tell mds.x heap start_profiler
> >> >> >>
> >> >> >> ... wait for some time
> >> >> >>
> >> >> >> ceph tell mds.x heap stop_profiler
> >> >> >>
> >> >> >> pprof --text /usr/bin/ceph-mds
> >> >> >> /var/log/ceph/ceph-mds.x.profile.<largest number>.heap
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >> On Tue, Jul 24, 2018 at 3:18 PM Daniel Carrasco <d.carrasco@xxxxxxxxx> wrote:
> >> >> >> >
> >> >> >> > This is what i get:
> >> >> >> >
> >> >> >> > --------------------------------------------------------
> >> >> >> > --------------------------------------------------------
> >> >> >> > --------------------------------------------------------
> >> >> >> > :/# ceph tell mds.kavehome-mgto-pro-fs01 heap dump
> >> >> >> > 2018-07-24 09:05:19.350720 7fc562ffd700 0 client.1452545 ms_handle_reset on 10.22.0.168:6800/1685786126
> >> >> >> > 2018-07-24 09:05:29.103903 7fc563fff700 0 client.1452548 ms_handle_reset on 10.22.0.168:6800/1685786126
> >> >> >> > mds.kavehome-mgto-pro-fs01 dumping heap profile now.
> >> >> >> > ------------------------------------------------
> >> >> >> > MALLOC: 760199640 ( 725.0 MiB) Bytes in use by application
> >> >> >> > MALLOC: + 0 ( 0.0 MiB) Bytes in page heap freelist
> >> >> >> > MALLOC: + 246962320 ( 235.5 MiB) Bytes in central cache freelist
> >> >> >> > MALLOC: + 43933664 ( 41.9 MiB) Bytes in transfer cache freelist
> >> >> >> > MALLOC: + 41012664 ( 39.1 MiB) Bytes in thread cache freelists
> >> >> >> > MALLOC: + 10186912 ( 9.7 MiB) Bytes in malloc metadata
> >> >> >> > MALLOC: ------------
> >> >> >> > MALLOC: = 1102295200 ( 1051.2 MiB) Actual memory used (physical + swap)
> >> >> >> > MALLOC: + 4268335104 ( 4070.6 MiB) Bytes released to OS (aka unmapped)
> >> >> >> > MALLOC: ------------
> >> >> >> > MALLOC: = 5370630304 ( 5121.8 MiB) Virtual address space used
> >> >> >> > MALLOC:
> >> >> >> > MALLOC: 33027 Spans in use
> >> >> >> > MALLOC: 19 Thread heaps in use
> >> >> >> > MALLOC: 8192 Tcmalloc page size
> >> >> >> > ------------------------------------------------
> >> >> >> > Call ReleaseFreeMemory() to release freelist memory to the OS (via madvise()).
> >> >> >> > Bytes released to the OS take up virtual address space but no physical memory.
> >> >> >> >
> >> >> >> >
> >> >> >> > --------------------------------------------------------
> >> >> >> > --------------------------------------------------------
> >> >> >> > --------------------------------------------------------
> >> >> >> > :/# ceph tell mds.kavehome-mgto-pro-fs01 heap stats
> >> >> >> > 2018-07-24 09:14:25.747706 7f94fffff700 0 client.1452578 ms_handle_reset on 10.22.0.168:6800/1685786126
> >> >> >> > 2018-07-24 09:14:25.754034 7f95057fa700 0 client.1452581 ms_handle_reset on 10.22.0.168:6800/1685786126
> >> >> >> > mds.kavehome-mgto-pro-fs01 tcmalloc heap stats:------------------------------------------------
> >> >> >> > MALLOC: 960649328 ( 916.1 MiB) Bytes in use by application
> >> >> >> > MALLOC: + 0 ( 0.0 MiB) Bytes in page heap freelist
> >> >> >> > MALLOC: + 108867288 ( 103.8 MiB) Bytes in central cache freelist
> >> >> >> > MALLOC: + 37179424 ( 35.5 MiB) Bytes in transfer cache freelist
> >> >> >> > MALLOC: + 40143000 ( 38.3 MiB) Bytes in thread cache freelists
> >> >> >> > MALLOC: + 10186912 ( 9.7 MiB) Bytes in malloc metadata
> >> >> >> > MALLOC: ------------
> >> >> >> > MALLOC: = 1157025952 ( 1103.4 MiB) Actual memory used (physical + swap)
> >> >> >> > MALLOC: + 4213604352 ( 4018.4 MiB) Bytes released to OS (aka unmapped)
> >> >> >> > MALLOC: ------------
> >> >> >> > MALLOC: = 5370630304 ( 5121.8 MiB) Virtual address space used
> >> >> >> > MALLOC:
> >> >> >> > MALLOC: 33028 Spans in use
> >> >> >> > MALLOC: 19 Thread heaps in use
> >> >> >> > MALLOC: 8192 Tcmalloc page size
> >> >> >> > ------------------------------------------------
> >> >> >> > Call ReleaseFreeMemory() to release freelist memory to the OS (via madvise()).
> >> >> >> > Bytes released to the OS take up virtual address space but no physical memory.
> >> >> >> >
> >> >> >> > --------------------------------------------------------
> >> >> >> > --------------------------------------------------------
> >> >> >> > --------------------------------------------------------
> >> >> >> > After heap release:
> >> >> >> > :/# ceph tell mds.kavehome-mgto-pro-fs01 heap stats
> >> >> >> > 2018-07-24 09:15:28.540203 7f2f7affd700 0 client.1443339 ms_handle_reset on 10.22.0.168:6800/1685786126
> >> >> >> > 2018-07-24 09:15:28.547153 7f2f7bfff700 0 client.1443342 ms_handle_reset on 10.22.0.168:6800/1685786126
> >> >> >> > mds.kavehome-mgto-pro-fs01 tcmalloc heap stats:------------------------------------------------
> >> >> >> > MALLOC: 710315776 ( 677.4 MiB) Bytes in use by application
> >> >> >> > MALLOC: + 0 ( 0.0 MiB) Bytes in page heap freelist
> >> >> >> > MALLOC: + 246471880 ( 235.1 MiB) Bytes in central cache freelist
> >> >> >> > MALLOC: + 40802848 ( 38.9 MiB) Bytes in transfer cache freelist
> >> >> >> > MALLOC: + 38689304 ( 36.9 MiB) Bytes in thread cache freelists
> >> >> >> > MALLOC: + 10186912 ( 9.7 MiB) Bytes in malloc metadata
> >> >> >> > MALLOC: ------------
> >> >> >> > MALLOC: = 1046466720 ( 998.0 MiB) Actual memory used (physical + swap)
> >> >> >> > MALLOC: + 4324163584 ( 4123.8 MiB) Bytes released to OS (aka unmapped)
> >> >> >> > MALLOC: ------------
> >> >> >> > MALLOC: = 5370630304 ( 5121.8 MiB) Virtual address space used
> >> >> >> > MALLOC:
> >> >> >> > MALLOC: 33177 Spans in use
> >> >> >> > MALLOC: 19 Thread heaps in use
> >> >> >> > MALLOC: 8192 Tcmalloc page size
> >> >> >> > ------------------------------------------------
> >> >> >> > Call ReleaseFreeMemory() to release freelist memory to the OS (via madvise()).
> >> >> >> > Bytes released to the OS take up virtual address space but no physical memory.
> >> >> >> >
> >> >> >> >
> >> >> >> > The other commands fails with a curl error:
> >> >> >> > Failed to get profile: curl 'http:///pprof/profile?seconds=30' > /root/pprof/.tmp.ceph-mds. 1532416424.:
> >> >> >> >
> >> >> >> >
> >> >> >> > Greetings!!
> >> >> >> >
> >> >> >> > 2018-07-24 5:35 GMT+02:00 Yan, Zheng <ukernel@xxxxxxxxx>:
> >> >> >> >>
> >> >> >> >> could you profile memory allocation of mds
> >> >> >> >>
> >> >> >> >> http://docs.ceph.com/docs/mimic/rados/troubleshooting/ memory-profiling/
> >> >> >> >> On Tue, Jul 24, 2018 at 7:54 AM Daniel Carrasco <d.carrasco@xxxxxxxxx> wrote:
> >> >> >> >> >
> >> >> >> >> > Yeah, is also my thread. This thread was created before lower the cache size from 512Mb to 8Mb. I thought that maybe was my fault and I did a misconfiguration, so I've ignored the problem until now.
> >> >> >> >> >
> >> >> >> >> > Greetings!
> >> >> >> >> >
> >> >> >> >> > El mar., 24 jul. 2018 1:00, Gregory Farnum <gfarnum@xxxxxxxxxx> escribió:
> >> >> >> >> >>
> >> >> >> >> >> On Mon, Jul 23, 2018 at 11:08 AM Patrick Donnelly <pdonnell@xxxxxxxxxx> wrote:
> >> >> >> >> >>>
> >> >> >> >> >>> On Mon, Jul 23, 2018 at 5:48 AM, Daniel Carrasco <d.carrasco@xxxxxxxxx> wrote:
> >> >> >> >> >>> > Hi, thanks for your response.
> >> >> >> >> >>> >
> >> >> >> >> >>> > Clients are about 6, and 4 of them are the most of time on standby. Only two
> >> >> >> >> >>> > are active servers that are serving the webpage. Also we've a varnish on
> >> >> >> >> >>> > front, so are not getting all the load (below 30% in PHP is not much).
> >> >> >> >> >>> > About the MDS cache, now I've the mds_cache_memory_limit at 8Mb.
> >> >> >> >> >>>
> >> >> >> >> >>> What! Please post `ceph daemon mds.<name> config diff`, `... perf
> >> >> >> >> >>> dump`, and `... dump_mempools ` from the server the active MDS is on.
> >> >> >> >> >>>
> >> >> >> >> >>> > I've tested
> >> >> >> >> >>> > also 512Mb, but the CPU usage is the same and the MDS RAM usage grows up to
> >> >> >> >> >>> > 15GB (on a 16Gb server it starts to swap and all fails). With 8Mb, at least
> >> >> >> >> >>> > the memory usage is stable on less than 6Gb (now is using about 1GB of RAM).
> >> >> >> >> >>>
> >> >> >> >> >>> We've seen reports of possible memory leaks before and the potential
> >> >> >> >> >>> fixes for those were in 12.2.6. How fast does your MDS reach 15GB?
> >> >> >> >> >>> Your MDS cache size should be configured to 1-8GB (depending on your
> >> >> >> >> >>> preference) so it's disturbing to see you set it so low.
> >> >> >> >> >>
> >> >> >> >> >>
> >> >> >> >> >> See also the thread " Fwd: MDS memory usage is very high", which had more discussion of that. The MDS daemon seemingly had 9.5GB of allocated RSS but only believed 489MB was in use for the cache...
> >> >> >> >> >> -Greg
> >> >> >> >> >>
> >> >> >> >> >>>
> >> >> >> >> >>>
> >> >> >> >> >>> --
> >> >> >> >> >>> Patrick Donnelly
> >> >> >> >> >>> _______________________________________________
> >> >> >> >> >>> ceph-users mailing list
> >> >> >> >> >>> ceph-users@xxxxxxxxxxxxxx
> >> >> >> >> >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph. com
> >> >> >> >> >
> >> >> >> >> > _______________________________________________
> >> >> >> >> > ceph-users mailing list
> >> >> >> >> > ceph-users@xxxxxxxxxxxxxx
> >> >> >> >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph. com
> >> >> >> >
> >> >> >> >
> >> >> >> >
> >> >> >> >
> >> >> >> > --
> >> >> >> > _________________________________________
> >> >> >> >
> >> >> >> > Daniel Carrasco Marín
> >> >> >> > Ingeniería para la Innovación i2TIC, S.L.
> >> >> >> > Tlf: +34 911 12 32 84 Ext: 223
> >> >> >> > www.i2tic.com
> >> >> >> > _________________________________________
> >> >> >
> >> >> >
> >> >> >
> >> >> >
> >> >> > --
> >> >> > _________________________________________
> >> >> >
> >> >> > Daniel Carrasco Marín
> >> >> > Ingeniería para la Innovación i2TIC, S.L.
> >> >> > Tlf: +34 911 12 32 84 Ext: 223
> >> >> > www.i2tic.com
> >> >> > _________________________________________
> >> >
> >> >
> >> >
> >> >
> >> > --
> >> > _________________________________________
> >> >
> >> > Daniel Carrasco Marín
> >> > Ingeniería para la Innovación i2TIC, S.L.
> >> > Tlf: +34 911 12 32 84 Ext: 223
> >> > www.i2tic.com
> >> > _________________________________________
> >
> >
> >
> >
> > --
> > _________________________________________
> >
> > Daniel Carrasco Marín
> > Ingeniería para la Innovación i2TIC, S.L.
> > Tlf: +34 911 12 32 84 Ext: 223
> > www.i2tic.com
> > _________________________________________
_________________________________________
Daniel Carrasco Marín
Ingeniería para la Innovación i2TIC, S.L.
Tlf: +34 911 12 32 84 Ext: 223
www.i2tic.com
_________________________________________
Ingeniería para la Innovación i2TIC, S.L.
Tlf: +34 911 12 32 84 Ext: 223
www.i2tic.com
_________________________________________
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com