Re: Mimic upgrade failure

Sage Weil <sage@xxxxxxxxxxxx> · Mon, 24 Sep 2018 14:24:52 +0000 (UTC)

Hi Kevin,

Do you have an update on the state of the cluster?

I've opened a ticket http://tracker.ceph.com/issues/36163 to track the 
likely root cause we identified, and have a PR open at 
https://github.com/ceph/ceph/pull/24247

Thanks!
sage

On Thu, 20 Sep 2018, Sage Weil wrote:
> On Thu, 20 Sep 2018, KEVIN MICHAEL HRPCEK wrote:
> > Top results when both were taken with ms_dispatch at 100%. The mon one 
> > changes alot so I've included 3 snapshots of those. I'll update 
> > mon_osd_cache_size.
> > 
> > After disabling auth_cluster_required and a cluster reboot I am having 
> > less problems keeping OSDs in the cluster since they seem to not be 
> > having auth problems around the 2 hour uptime mark. The mons still have 
> > their problems but 859/861 OSDs are up with 2 crashing. I found a brief 
> > mention on a forum or somewhere that the mons will only trim their 
> > storedb when the cluster is healthy. If that's true do you think it is 
> > likely that once all osds are healthy and unset some no* cluster flags 
> > the mons will be able to trim their db and the result will be that 
> > ms_dispatch no longer takes to long to churn through the db? Our primary 
> > theory here is that ms_dispatch is taking too long and the mons reach a 
> > timeout and then reelect in a nonstop cycle.
> 
> It's the PGs that need to all get healthy (active+clean) before the 
> osdmaps get trimmed.  Other health warnigns (e.g. about noout being set) 
> aren't related.
> 
> > ceph-mon
> >     34.24%    34.24%  libpthread-2.17.so    [.] pthread_rwlock_rdlock
> > +   34.00%    34.00%  libceph-common.so.0   [.] crush_hash32_3
> 
> If this is the -g output you need to hit enter on lines like this to see 
> the call graph...  Or you can do 'perf record -g -p <pid>' and then 'perf 
> report --stdio' (or similar) to dump it all to a file, fully expanded.
> 
> Thanks!
> sage
> 
> > +    5.01%     5.01%  libceph-common.so.0   [.] ceph::decode<pg_t, std::vector<int, mempool::pool_allocator<(mempool::pool_index_t)15, int> >, std::less<pg_t>, mempool::pool_allocator<(mempool::pool_index_t)15, std::pair<pg_t const, std::vector<int, mempool::pool_allocator<(mempool::pool_index_t)
> > +    4.66%     0.00%  libceph-common.so.0   [.] ceph::buffer::raw_combined::~raw_combined
> > +    2.37%     0.00%  [unknown]             [.] 0000000000000000
> > +    2.21%     2.21%  libceph-common.so.0   [.] ceph::buffer::ptr::release
> > +    2.14%     2.14%  libceph-common.so.0   [.] std::_Rb_tree<pg_t, std::pair<pg_t const, std::vector<int, mempool::pool_allocator<(mempool::pool_index_t)15, int> > >, std::_Select1st<std::pair<pg_t const, std::vector<int, mempool::pool_allocator<(mempool::pool_index_t)15, int> > > >, std::less<p
> > +    1.88%     1.08%  libc-2.17.so          [.] __memcpy_ssse3_back
> > +    1.70%     1.70%  ceph-mon              [.] std::_Rb_tree<pg_t, std::pair<pg_t const, std::vector<int, mempool::pool_allocator<(mempool::pool_index_t)15, int> > >, std::_Select1st<std::pair<pg_t const, std::vector<int, mempool::pool_allocator<(mempool::pool_index_t)15, int> > > >, std::less<p
> > +    1.65%     1.65%  libceph-common.so.0   [.] ceph::buffer::list::append
> > +    1.54%     1.54%  libstdc++.so.6.0.19   [.] std::_Rb_tree_increment
> >      1.44%     1.44%  libstdc++.so.6.0.19   [.] std::_Rb_tree_insert_and_rebalance
> > +    1.28%     1.28%  libceph-common.so.0   [.] ceph::buffer::ptr::append
> > +    1.23%     1.23%  libceph-common.so.0   [.] ceph::buffer::list::append
> > +    0.98%     0.98%  libceph-common.so.0   [.] ceph::buffer::ptr::ptr
> > +    0.79%     0.79%  libceph-common.so.0   [.] ceph::buffer::list::iterator_impl<false>::copy
> > +    0.79%     0.79%  libceph-common.so.0   [.] ceph::buffer::list::iterator_impl<false>::advance
> > +    0.66%     0.66%  libceph-common.so.0   [.] ceph::buffer::ptr::unused_tail_length
> > +    0.64%     0.64%  libceph-common.so.0   [.] ceph::encode<pg_t, std::vector<int, mempool::pool_allocator<(mempool::pool_index_t)15, int> >, std::less<pg_t>, mempool::pool_allocator<(mempool::pool_index_t)15, std::pair<pg_t const, std::vector<int, mempool::pool_allocator<(mempool::pool_index_t)
> > +    0.60%     0.60%  libceph-common.so.0   [.] ceph::buffer::ptr::operator=
> >      0.53%     0.53%  libtcmalloc.so.4.4.5  [.] tcmalloc::CentralFreeList::FetchFromOneSpans
> >      0.52%     0.52%  libceph-common.so.0   [.] ceph::buffer::ptr::copy_out
> > +    0.51%     0.51%  libtcmalloc.so.4.4.5  [.] operator new[]
> > +    0.49%     0.49%  libceph-common.so.0   [.] ceph::decode<std::vector<int, mempool::pool_allocator<(mempool::pool_index_t)15, int> >, denc_traits<std::vector<int, mempool::pool_allocator<(mempool::pool_index_t)15, int> >, void> >
> >      0.49%     0.49%  ceph-mon              [.] std::_Rb_tree<pg_t, std::pair<pg_t const, std::vector<int, mempool::pool_allocator<(mempool::pool_index_t)15, int> > >, std::_Select1st<std::pair<pg_t const, std::vector<int, mempool::pool_allocator<(mempool::pool_index_t)15, int> > > >, std::less<p
> > +    0.29%     0.29%  ceph-mon              [.] ceph::buffer::list::contiguous_appender::~contiguous_appender
> >      0.27%     0.27%  libceph-common.so.0   [.] ceph::buffer::list::iterator_impl<false>::advance@plt
> > +    0.27%     0.27%  [kernel]              [k] retint_userspace_restore_args
> > 
> > ceph-mon
> >   Children      Self  Shared Object         Symbol
> > +   12.36%    12.36%  libceph-common.so.0   [.] ceph::buffer::list::append
> > +   11.58%    11.58%  libceph-common.so.0   [.] ceph::decode<pg_t, std::vector<int, mempool::pool_allocator<(mempool::pool_index_t)15, int> >, std::less<pg_t>, mempool::pool_allocator<(mempool::pool_index_t)15, std::pair<pg_t const, std::vector<int, mempool::pool_allocator<(mempool::pool_index_t)
> > +    8.76%     0.00%  libceph-common.so.0   [.] ceph::buffer::raw_combined::~raw_combined
> > +    7.10%     7.10%  libceph-common.so.0   [.] ceph::buffer::list::append
> > +    6.51%     6.51%  libceph-common.so.0   [.] ceph::buffer::ptr::ptr
> > +    6.29%     6.29%  libceph-common.so.0   [.] std::_Rb_tree<pg_t, std::pair<pg_t const, std::vector<int, mempool::pool_allocator<(mempool::pool_index_t)15, int> > >, std::_Select1st<std::pair<pg_t const, std::vector<int, mempool::pool_allocator<(mempool::pool_index_t)15, int> > > >, std::less<p
> > +    5.73%     5.73%  ceph-mon              [.] std::_Rb_tree<pg_t, std::pair<pg_t const, std::vector<int, mempool::pool_allocator<(mempool::pool_index_t)15, int> > >, std::_Select1st<std::pair<pg_t const, std::vector<int, mempool::pool_allocator<(mempool::pool_index_t)15, int> > > >, std::less<p
> > +    5.06%     5.06%  libceph-common.so.0   [.] ceph::encode<pg_t, std::vector<int, mempool::pool_allocator<(mempool::pool_index_t)15, int> >, std::less<pg_t>, mempool::pool_allocator<(mempool::pool_index_t)15, std::pair<pg_t const, std::vector<int, mempool::pool_allocator<(mempool::pool_index_t)
> > +    4.35%     4.35%  libceph-common.so.0   [.] ceph::buffer::ptr::release
> > +    3.10%     0.00%  libc-2.17.so          [.] __memcpy_ssse3_back
> >      2.94%     2.94%  libstdc++.so.6.0.19   [.] std::_Rb_tree_insert_and_rebalance
> > +    2.80%     0.00%  [unknown]             [.] 0000000000000000
> > +    2.72%     2.72%  libceph-common.so.0   [.] ceph::decode<std::vector<int, mempool::pool_allocator<(mempool::pool_index_t)15, int> >, denc_traits<std::vector<int, mempool::pool_allocator<(mempool::pool_index_t)15, int> >, void> >
> > +    2.35%     2.35%  libstdc++.so.6.0.19   [.] std::_Rb_tree_increment
> >      2.22%     2.22%  libceph-common.so.0   [.] ceph::buffer::ptr::copy_out
> >      2.09%     2.09%  libceph-common.so.0   [.] ceph::buffer::list::iterator_impl<false>::advance@plt
> > +    2.01%     2.01%  [kernel]              [k] retint_userspace_restore_args
> > +    1.95%     1.95%  ceph-mon              [.] ceph::buffer::list::contiguous_appender::~contiguous_appender
> > +    1.89%     1.89%  libceph-common.so.0   [.] crush_hash32_3
> >      1.83%     1.83%  libceph-common.so.0   [.] crc32_iscsi_00
> > +    1.60%     1.60%  libceph-common.so.0   [.] ceph::buffer::list::iterator_impl<false>::copy
> > +    1.57%     1.57%  libtcmalloc.so.4.4.5  [.] operator new[]
> > +    1.40%     1.40%  libceph-common.so.0   [.] ceph::buffer::list::iterator_impl<false>::advance
> > +    1.32%     1.32%  libceph-common.so.0   [.] ceph::buffer::ptr::end_c_str
> > +    1.23%     1.23%  libceph-common.so.0   [.] ceph::buffer::ptr::append
> > 
> > ceph-mon
> >   Children      Self  Shared Object         Symbol
> > +   32.46%     0.00%  libceph-common.so.0   [.] OSDMap::_pg_to_up_acting_osds
> > +   32.46%    32.46%  libceph-common.so.0   [.] crush_init_workspace
> > +   28.13%    28.13%  libceph-common.so.0   [.] crush_hash32_3
> > +   18.91%     0.00%  [kernel]              [k] system_call_fastpath
> > +   16.23%     0.00%  [kernel]              [k] unmap_region
> > +   16.23%    16.23%  [kernel]              [k] unmap_page_range
> > +   10.55%    10.55%  [kernel]              [k] system_call_after_swapgs
> > +    7.25%     0.00%  [unknown]             [.] 0000000000000000
> > +    7.25%     7.25%  ceph-mon              [.] OSDMap::deepish_copy_from
> > +    1.74%     1.74%  libceph-common.so.0   [.] AsyncConnection::read_bulk
> > +    1.45%     1.45%  [kernel]              [k] ___sys_sendmsg
> > +    1.23%     1.23%  [kernel]              [k] native_sched_clock
> > +    1.23%     0.00%  [kernel]              [k] sched_clock
> > +    1.23%     0.00%  [kernel]              [k] __schedule
> > +    1.23%     0.00%  [kernel]              [k] __perf_event_task_sched_out
> > +    1.23%     0.00%  [kernel]              [k] ctx_sched_out
> > +    0.95%     0.95%  libceph-common.so.0   [.] ceph::buffer::list::append
> > 
> > 
> > 
> > ms_dispatch thread
> >   Children      Self  Shared Object         Symbol
> > +   15.07%    15.07%  libceph-common.so.0   [.] ceph::decode<pg_t, std::vector<int, mempool::pool_allocator<(mempool::pool_index_t)15, int> >, std::less<pg_t>, mempool::pool_allocator<(mempool::pool_index_t)15, std::pair<pg_t const, std::vector<int, mempool::pool_allocator<(mempool::pool_index_t)
> > +   10.24%     0.00%  libceph-common.so.0   [.] ceph::buffer::raw_combined::~raw_combined
> > +    7.27%     7.27%  ceph-mon              [.] std::_Rb_tree<pg_t, std::pair<pg_t const, std::vector<int, mempool::pool_allocator<(mempool::pool_index_t)15, int> > >, std::_Select1st<std::pair<pg_t const, std::vector<int, mempool::pool_allocator<(mempool::pool_index_t)15, int> > > >, std::less<p
> > +    6.19%     6.19%  libceph-common.so.0   [.] ceph::buffer::ptr::release
> > +    4.84%     4.84%  libstdc++.so.6.0.19   [.] std::_Rb_tree_increment
> > +    4.59%     4.59%  libceph-common.so.0   [.] ceph::buffer::list::append
> > +    4.20%     4.20%  ceph-mon              [.] std::_Rb_tree<std::string, std::string, std::_Identity<std::string>, std::less<std::string>, std::allocator<std::string> >::find
> > +    4.04%     4.04%  libceph-common.so.0   [.] std::_Rb_tree<pg_t, std::pair<pg_t const, std::vector<int, mempool::pool_allocator<(mempool::pool_index_t)15, int> > >, std::_Select1st<std::pair<pg_t const, std::vector<int, mempool::pool_allocator<(mempool::pool_index_t)15, int> > > >, std::less<p
> >      3.94%     3.94%  libtcmalloc.so.4.4.5  [.] tcmalloc::CentralFreeList::FetchFromOneSpans
> > +    3.71%     3.71%  libceph-common.so.0   [.] ceph::buffer::ptr::ptr
> > +    3.43%     3.43%  libceph-common.so.0   [.] ceph::buffer::list::append
> > +    3.03%     1.18%  libc-2.17.so          [.] __memcpy_ssse3_back
> > +    2.66%     0.00%  [kernel]              [k] do_futex
> > +    2.66%     0.00%  [kernel]              [k] futex_wait_queue_me
> > +    2.66%     1.15%  [kernel]              [k] timerqueue_add
> > +    2.49%     2.49%  libceph-common.so.0   [.] ceph::decode<std::vector<int, mempool::pool_allocator<(mempool::pool_index_t)15, int> >, denc_traits<std::vector<int, mempool::pool_allocator<(mempool::pool_index_t)15, int> >, void> >
> > +    2.02%     2.02%  libceph-common.so.0   [.] ceph::buffer::ptr::unused_tail_length
> > +    1.99%     1.99%  libceph-common.so.0   [.] ceph::buffer::list::iterator_impl<false>::copy
> > +    1.67%     0.00%  [unknown]             [k] 0000000000000000
> >      1.64%     1.64%  libstdc++.so.6.0.19   [.] std::_Rb_tree_insert_and_rebalance
> > +    1.57%     1.57%  libtcmalloc.so.4.4.5  [.] operator new[]
> > +    1.56%     1.56%  libceph-common.so.0   [.] ceph::buffer::ptr::copy_out
> > +    1.55%     1.55%  libceph-common.so.0   [.] ceph::buffer::ptr::append
> >      1.53%     1.53%  libceph-common.so.0   [.] ceph::buffer::list::iterator_impl<false>::copy@plt
> > +    1.51%     1.51%  [kernel]              [k] rb_insert_color
> > +    1.36%     1.36%  libceph-common.so.0   [.] ceph::buffer::list::iterator_impl<false>::advance
> > +    1.27%     1.27%  libceph-common.so.0   [.] ceph::encode<pg_t, std::vector<int, mempool::pool_allocator<(mempool::pool_index_t)15, int> >, std::less<pg_t>, mempool::pool_allocator<(mempool::pool_index_t)15, std::pair<pg_t const, std::vector<int, mempool::pool_allocator<(mempool::pool_index_t)
> >      1.13%     1.13%  ceph-mon              [.] std::_Rb_tree<pg_t, std::pair<pg_t const, std::vector<int, mempool::pool_allocator<(mempool::pool_index_t)15, int> > >, std::_Select1st<std::pair<pg_t const, std::vector<int, mempool::pool_allocator<(mempool::pool_index_t)15, int> > > >, std::less<p
> > +    1.10%     1.10%  ceph-mon              [.] entity_addr_t::encode
> >      1.08%     1.08%  libceph-common.so.0   [.] ceph::buffer::ptr::unused_tail_length@plt
> >      1.04%     1.04%  libceph-common.so.0   [.] ceph::buffer::list::contiguous_appender::~contiguous_appender@plt
> >      1.03%     1.03%  libtcmalloc.so.4.4.5  [.] tcmalloc::ThreadCache::ReleaseToCentralCache
> > +    1.03%     1.03%  libtcmalloc.so.4.4.5  [.] tcmalloc::CentralFreeList::ReleaseToSpans
> >      0.92%     0.92%  libceph-common.so.0   [.] ceph::buffer::list::iterator::advance
> > 
> > 
> > 
> > 
> > On 09/20/2018 08:21 AM, Sage Weil wrote:
> > 
> > Can you try 'perf top -g -p <pid>' to see where all of the encoding
> > activity is coming from?  I see two possibilities (the mon attempts to
> > cache encoded maps, and the MOSDMap message itself will also reencode
> > if/when that fails).
> > 
> > Also: mon_osd_cache_size = 10 by default... try making that 500 or
> > something.
> > 
> > sage
> > 
> > 
> > 
> > On Wed, 19 Sep 2018, Kevin Hrpcek wrote:
> > 
> > 
> > 
> > Majority of the clients are luminous with a few kraken stragglers. I looked at
> > ceph features and 'ceph daemon mon.sephmon1 sessions'. Nothing is reporting as
> > having mimic features, all mon,mgr,osd are running 13.2.1 but are reporting
> > luminous features, and majority of the luminous clients are reporting jewel
> > features. I shut down my compute cluster to get rid of majority of the clients
> > that are reporting jewel features, and there is still a lot of time spent by
> > ms_dispatch in ceph::decode<pg_t.......
> > 
> > My small mimic test cluster actually shows similar in it's features,
> > mon,mgr,mds,osd all report luminous features yet have 13.2.1 installed, so
> > maybe that is normal.
> > 
> > Kevin
> > 
> > On 09/19/2018 09:35 AM, Sage Weil wrote:
> > 
> > 
> > It's hard to tell exactly from the below, but it looks to me like there is
> > still a lot of OSDMap reencoding going on.  Take a look at 'ceph features'
> > output and see who in the cluster is using pre-luminous features.. I'm
> > guessing all of the clients?  For any of those sessions, fetching OSDMaps
> > from the cluster will require reencoding.
> > 
> > If it's all clients (well, non-OSDs), I think we could work around it by
> > avoiding the reencode entirely (it is only really there for OSDs, which
> > want a perfect OSDMap copy that will match the monitor's CRC).
> > 
> > sage
> > 
> > On Wed, 19 Sep 2018, KEVIN MICHAEL HRPCEK wrote:
> > 
> > 
> > 
> > I set mon lease = 30 yesterday and it had no effect on the quorum
> > election. To give you an idea of how much cpu ms_dispatch is using, from
> > the last mon restart about 7.5 hours ago, the ms_dispatch thread has 5h
> > 40m of cpu time. Below are 2 snippets from perf top. I took them while
> > ms_dispatch was 100% of a core, the first is using the pid of the
> > ceph-mon, the second is the pid of the ms_dispatch thread. The last thing
> > is a snippet from stracing the ms_dispatch pid. It is running through all
> > of the sst files.
> > 
> > perf top ceph-mon
> > Overhead  Shared Object         Symbol
> >    17.71%  libceph-common.so.0   [.] ceph::decode<pg_t, std::vector<int,
> > mempool::pool_allocator<(mempool::pool_index_t)15, int> >,
> > std::less<pg_t>, mempool::pool_allocator<(mempool::pool_index_t)15,
> > std::pair<pg_t const, std::vector<int,
> > mempool::pool_allocator<(mempool::pool_index_t)15, int> > >
> >     6.67%  libceph-common.so.0   [.] ceph::buffer::ptr::release
> >     5.35%  libceph-common.so.0   [.] std::_Rb_tree<pg_t, std::pair<pg_t
> > const, std::vector<int, mempool::pool_allocator<(mempool::pool_index_t)15,
> > int> > >, std::_Select1st<std::pair<pg_t const, std::vector<int,
> > mempool::pool_allocator<(mempool::pool_index_t)15, int> > > >,
> > std::less<pg_t>, mempoo
> >     5.20%  libceph-common.so.0   [.] ceph::buffer::ptr::append
> >     5.12%  libceph-common.so.0   [.]
> > ceph::buffer::list::iterator_impl<false>::copy
> >     4.66%  libceph-common.so.0   [.] ceph::buffer::list::append
> >     4.33%  libstdc++.so.6.0.19   [.] std::_Rb_tree_increment
> >     4.27%  ceph-mon              [.] std::_Rb_tree<pg_t, std::pair<pg_t
> > const, std::vector<int, mempool::pool_allocator<(mempool::pool_index_t)15,
> > int> > >, std::_Select1st<std::pair<pg_t const, std::vector<int,
> > mempool::pool_allocator<(mempool::pool_index_t)15, int> > > >,
> > std::less<pg_t>, mempoo
> >     4.18%  libceph-common.so.0   [.] ceph::buffer::list::append
> >     3.10%  libceph-common.so.0   [.] ceph::decode<std::vector<int,
> > mempool::pool_allocator<(mempool::pool_index_t)15, int> >,
> > denc_traits<std::vector<int,
> > mempool::pool_allocator<(mempool::pool_index_t)15, int> >, void> >
> >     2.90%  libceph-common.so.0   [.] ceph::encode<pg_t, std::vector<int,
> > mempool::pool_allocator<(mempool::pool_index_t)15, int> >,
> > std::less<pg_t>, mempool::pool_allocator<(mempool::pool_index_t)15,
> > std::pair<pg_t const, std::vector<int,
> > mempool::pool_allocator<(mempool::pool_index_t)15, int> > >
> >     2.56%  libceph-common.so.0   [.] ceph::buffer::ptr::ptr
> >     2.50%  libstdc++.so.6.0.19   [.] std::_Rb_tree_insert_and_rebalance
> >     2.39%  libceph-common.so.0   [.] ceph::buffer::ptr::copy_out
> >     2.33%  libceph-common.so.0   [.]
> > ceph::buffer::list::iterator_impl<false>::advance
> >     2.21%  libtcmalloc.so.4.4.5  [.]
> > tcmalloc::CentralFreeList::FetchFromOneSpans
> >     1.97%  libtcmalloc.so.4.4.5  [.]
> > tcmalloc::CentralFreeList::ReleaseToSpans
> >     1.60%  libceph-common.so.0   [.] crc32_iscsi_00
> >     1.42%  libtcmalloc.so.4.4.5  [.] operator new[]
> >     1.29%  libceph-common.so.0   [.] ceph::buffer::ptr::unused_tail_length
> >     1.28%  libceph-common.so.0   [.]
> > ceph::buffer::list::iterator_impl<false>::copy_shallow
> >     1.25%  libceph-common.so.0   [.] ceph::buffer::ptr::raw_length@plt
> >     1.06%  libceph-common.so.0   [.] ceph::buffer::ptr::end_c_str
> >     1.06%  libceph-common.so.0   [.] ceph::buffer::list::iterator::copy
> >     0.99%  libc-2.17.so          [.] __memcpy_ssse3_back
> >     0.94%  libc-2.17.so          [.] _IO_default_xsputn
> >     0.89%  libceph-common.so.0   [.]
> > ceph::buffer::list::iterator_impl<false>::advance@plt
> >     0.87%  libtcmalloc.so.4.4.5  [.]
> > tcmalloc::ThreadCache::ReleaseToCentralCache
> >     0.76%  libleveldb.so.1.0.7   [.] leveldb::FindFile
> >     0.72%  [vdso]                [.] __vdso_clock_gettime
> >     0.67%  ceph-mon              [.] std::_Rb_tree<pg_t, std::pair<pg_t
> > const, std::vector<int, mempool::pool_allocator<(mempool::pool_index_t)15,
> > int> > >, std::_Select1st<std::pair<pg_t const, std::vector<int,
> > mempool::pool_allocator<(mempool::pool_index_t)15, int> > > >,
> > std::less<pg_t>, mempoo
> >     0.63%  libtcmalloc.so.4.4.5  [.] tc_deletearray_nothrow
> >     0.59%  libceph-common.so.0   [.] ceph::buffer::list::iterator::advance
> >     0.52%  libceph-common.so.0   [.]
> > ceph::buffer::list::iterator::get_current_ptr
> > 
> > 
> > perf top ms_dispatch
> >    11.88%  libceph-common.so.0   [.] ceph::decode<pg_t, std::vector<int,
> > mempool::pool_allocator<(mempool::pool_index_t)15, int> >,
> > std::less<pg_t>, mempool::pool_allocator<(mempool::pool_index_t)15,
> > std::pair<pg_t const, std::vector<int,
> > mempool::pool_allocator<(mempool::pool_index_t)15, int> > >
> >    11.23%  [kernel]              [k] system_call_after_swapgs
> >     9.36%  libceph-common.so.0   [.] crush_hash32_3
> >     6.55%  libceph-common.so.0   [.] crush_choose_indep
> >     4.39%  [kernel]              [k] smp_call_function_many
> >     3.17%  libceph-common.so.0   [.] ceph::buffer::list::append
> >     3.03%  libceph-common.so.0   [.] ceph::buffer::list::append
> >     3.02%  libceph-common.so.0   [.] std::_Rb_tree<pg_t, std::pair<pg_t
> > const, std::vector<int, mempool::pool_allocator<(mempool::pool_index_t)15,
> > int> > >, std::_Select1st<std::pair<pg_t const, std::vector<int,
> > mempool::pool_allocator<(mempool::pool_index_t)15, int> > > >,
> > std::less<pg_t>, mempoo
> >     2.92%  libceph-common.so.0   [.] ceph::buffer::ptr::release
> >     2.65%  libceph-common.so.0   [.]
> > ceph::buffer::list::iterator_impl<false>::advance
> >     2.57%  ceph-mon              [.] std::_Rb_tree<pg_t, std::pair<pg_t
> > const, std::vector<int, mempool::pool_allocator<(mempool::pool_index_t)15,
> > int> > >, std::_Select1st<std::pair<pg_t const, std::vector<int,
> > mempool::pool_allocator<(mempool::pool_index_t)15, int> > > >,
> > std::less<pg_t>, mempoo
> >     2.27%  libceph-common.so.0   [.] ceph::buffer::ptr::ptr
> >     1.99%  libstdc++.so.6.0.19   [.] std::_Rb_tree_increment
> >     1.93%  libc-2.17.so          [.] __memcpy_ssse3_back
> >     1.91%  libceph-common.so.0   [.] ceph::buffer::ptr::append
> >     1.87%  libceph-common.so.0   [.] crush_hash32_3@plt
> >     1.84%  libceph-common.so.0   [.]
> > ceph::buffer::list::iterator_impl<false>::copy
> >     1.75%  libtcmalloc.so.4.4.5  [.]
> > tcmalloc::CentralFreeList::FetchFromOneSpans
> >     1.63%  libceph-common.so.0   [.] ceph::encode<pg_t, std::vector<int,
> > mempool::pool_allocator<(mempool::pool_index_t)15, int> >,
> > std::less<pg_t>, mempool::pool_allocator<(mempool::pool_index_t)15,
> > std::pair<pg_t const, std::vector<int,
> > mempool::pool_allocator<(mempool::pool_index_t)15, int> > >
> >     1.57%  libceph-common.so.0   [.] ceph::buffer::ptr::copy_out
> >     1.55%  libstdc++.so.6.0.19   [.] std::_Rb_tree_insert_and_rebalance
> >     1.47%  libceph-common.so.0   [.] ceph::buffer::ptr::raw_length
> >     1.33%  libtcmalloc.so.4.4.5  [.] tc_deletearray_nothrow
> >     1.09%  libceph-common.so.0   [.] ceph::decode<std::vector<int,
> > mempool::pool_allocator<(mempool::pool_index_t)15, int> >,
> > denc_traits<std::vector<int,
> > mempool::pool_allocator<(mempool::pool_index_t)15, int> >, void> >
> >     1.07%  libtcmalloc.so.4.4.5  [.] operator new[]
> >     1.02%  libceph-common.so.0   [.] ceph::buffer::list::iterator::copy
> >     1.01%  libtcmalloc.so.4.4.5  [.] tc_posix_memalign
> >     0.85%  ceph-mon              [.] ceph::buffer::ptr::release@plt
> >     0.76%  libceph-common.so.0   [.] ceph::buffer::ptr::copy_out@plt
> >     0.74%  libceph-common.so.0   [.] crc32_iscsi_00
> > 
> > strace
> > munmap(0x7f2eda736000, 2463941)         = 0
> > open("/var/lib/ceph/mon/ceph-sephmon1/store.db/26299339.sst", O_RDONLY) =
> > 429
> > stat("/var/lib/ceph/mon/ceph-sephmon1/store.db/26299339.sst",
> > {st_mode=S_IFREG|0644, st_size=1658656, ...}) = 0
> > mmap(NULL, 1658656, PROT_READ, MAP_SHARED, 429, 0) = 0x7f2eea87e000
> > close(429)                              = 0
> > munmap(0x7f2ea8c97000, 2468005)         = 0
> > open("/var/lib/ceph/mon/ceph-sephmon1/store.db/26299338.sst", O_RDONLY) =
> > 429
> > stat("/var/lib/ceph/mon/ceph-sephmon1/store.db/26299338.sst",
> > {st_mode=S_IFREG|0644, st_size=2484001, ...}) = 0
> > mmap(NULL, 2484001, PROT_READ, MAP_SHARED, 429, 0) = 0x7f2eda74b000
> > close(429)                              = 0
> > munmap(0x7f2ee21dc000, 2472343)         = 0
> > 
> > Kevin
> > 
> > 
> > On 09/19/2018 06:50 AM, Sage Weil wrote:
> > 
> > On Wed, 19 Sep 2018, KEVIN MICHAEL HRPCEK wrote:
> > 
> > 
> > Sage,
> > 
> > Unfortunately the mon election problem came back yesterday and it makes
> > it really hard to get a cluster to stay healthy. A brief unexpected
> > network outage occurred and sent the cluster into a frenzy and when I
> > had it 95% healthy the mons started their nonstop reelections. In the
> > previous logs I sent were you able to identify why the mons are
> > constantly electing? The elections seem to be triggered by the below
> > paxos message but do you know which lease timeout is being reached or
> > why the lease isn't renewed instead of calling for an election?
> > 
> > One thing I tried was to shutdown the entire cluster and bring up only
> > the mon and mgr. The mons weren't able to hold their quorum with no osds
> > running and the ceph-mon ms_dispatch thread runs at 100% for > 60s at a
> > time.
> > 
> > 
> > 
> > This is odd... with no other dameons running I'm not sure what would be
> > eating up the CPU.  Can you run a 'perf top -p `pidof ceph-mon`' (or
> > similar) on the machine to see what the process is doing?  You might need
> > to install ceph-mon-dbg or ceph-debuginfo to get better symbols.
> > 
> > 
> > 
> > 2018-09-19 03:56:21.729 7f4344ec1700 1 mon.sephmon2@1(peon).paxos(paxos
> > active c 133382665..133383355) lease_timeout -- calling new election
> > 
> > 
> > 
> > A workaround is probably to increase the lease timeout.  Try setting
> > mon_lease = 15 (default is 5... could also go higher than 15) in the
> > ceph.conf for all of the mons.  This is a bit of a band-aid but should
> > help you keep the mons in quorum until we sort out what is going on.
> > 
> > sage
> > 
> > 
> > 
> > 
> > 
> > Thanks
> > Kevin
> > 
> > On 09/10/2018 07:06 AM, Sage Weil wrote:
> > 
> > I took a look at the mon log you sent.  A few things I noticed:
> > 
> > - The frequent mon elections seem to get only 2/3 mons about half of the
> > time.
> > - The messages coming in a mostly osd_failure, and half of those seem to
> > be recoveries (cancellation of the failure message).
> > 
> > It does smell a bit like a networking issue, or some tunable that relates
> > to the messaging layer.  It might be worth looking at an OSD log for an
> > osd that reported a failure and seeing what error code it coming up on the
> > failed ping connection?  That might provide a useful hint (e.g.,
> > ECONNREFUSED vs EMFILE or something).
> > 
> > I'd also confirm that with nodown set the mon quorum stabilizes...
> > 
> > sage
> > 
> > 
> > 
> > 
> > On Mon, 10 Sep 2018, Kevin Hrpcek wrote:
> > 
> > 
> > 
> > Update for the list archive.
> > 
> > I went ahead and finished the mimic upgrade with the osds in a fluctuating
> > state of up and down. The cluster did start to normalize a lot easier
> > after
> > everything was on mimic since the random mass OSD heartbeat failures
> > stopped
> > and the constant mon election problem went away. I'm still battling with
> > the
> > cluster reacting poorly to host reboots or small map changes, but I feel
> > like
> > my current pg:osd ratio may be playing a factor in that since we are 2x
> > normal
> > pg count while migrating data to new EC pools.
> > 
> > I'm not sure of the root cause but it seems like the mix of luminous and
> > mimic
> > did not play well together for some reason. Maybe it has to do with the
> > scale
> > of my cluster, 871 osd, or maybe I've missed some some tuning as my
> > cluster
> > has scaled to this size.
> > 
> > Kevin
> > 
> > 
> > On 09/09/2018 12:49 PM, Kevin Hrpcek wrote:
> > 
> > 
> > Nothing too crazy for non default settings. Some of those osd settings
> > were
> > in place while I was testing recovery speeds and need to be brought back
> > closer to defaults. I was setting nodown before but it seems to mask the
> > problem. While its good to stop the osdmap changes, OSDs would come up,
> > get
> > marked up, but at some point go down again (but the process is still
> > running) and still stay up in the map. Then when I'd unset nodown the
> > cluster would immediately mark 250+ osd down again and i'd be back where I
> > started.
> > 
> > This morning I went ahead and finished the osd upgrades to mimic to remove
> > that variable. I've looked for networking problems but haven't found any.
> > 2
> > of the mons are on the same switch. I've also tried combinations of
> > shutting
> > down a mon to see if a single one was the problem, but they keep electing
> > no
> > matter the mix of them that are up. Part of it feels like a networking
> > problem but I haven't been able to find a culprit yet as everything was
> > working normally before starting the upgrade. Other than the constant mon
> > elections, yesterday I had the cluster 95% healthy 3 or 4 times, but it
> > doesn't last long since at some point the OSDs start trying to fail each
> > other through their heartbeats.
> > 2018-09-09 17:37:29.079 7eff774f5700  1 mon.sephmon1@0(leader).osd e991282
> > prepare_failure osd.39 10.1.9.2:6802/168438 from osd.49
> > 10.1.9.3:6884/317908
> > is reporting failure:1
> > 2018-09-09 17:37:29.079 7eff774f5700  0 log_channel(cluster) log [DBG] :
> > osd.39 10.1.9.2:6802/168438 reported failed by osd.49 10.1.9.3:6884/317908
> > 2018-09-09 17:37:29.083 7eff774f5700  1 mon.sephmon1@0(leader).osd e991282
> > prepare_failure osd.93 10.1.9.9:6853/287469 from osd.372
> > 10.1.9.13:6801/275806 is reporting failure:1
> > 
> > I'm working on getting things mostly good again with everything on mimic
> > and
> > will see if it behaves better.
> > 
> > Thanks for your input on this David.
> > 
> > 
> > [global]
> > mon_initial_members = sephmon1, sephmon2, sephmon3
> > mon_host = 10.1.9.201,10.1.9.202,10.1.9.203
> > auth_cluster_required = cephx
> > auth_service_required = cephx
> > auth_client_required = cephx
> > filestore_xattr_use_omap = true
> > public_network = 10.1.0.0/16
> > osd backfill full ratio = 0.92
> > osd failsafe nearfull ratio = 0.90
> > osd max object size = 21474836480
> > mon max pg per osd = 350
> > 
> > [mon]
> > mon warn on legacy crush tunables = false
> > mon pg warn max per osd = 300
> > mon osd down out subtree limit = host
> > mon osd nearfull ratio = 0.90
> > mon osd full ratio = 0.97
> > mon health preluminous compat warning = false
> > osd heartbeat grace = 60
> > rocksdb cache size = 1342177280
> > 
> > [mds]
> > mds log max segments = 100
> > mds log max expiring = 40
> > mds bal fragment size max = 200000
> > mds cache memory limit = 4294967296
> > 
> > [osd]
> > osd mkfs options xfs = -i size=2048 -d su=512k,sw=1
> > osd recovery delay start = 30
> > osd recovery max active = 5
> > osd max backfills = 3
> > osd recovery threads = 2
> > osd crush initial weight = 0
> > osd heartbeat interval = 30
> > osd heartbeat grace = 60
> > 
> > 
> > On 09/08/2018 11:24 PM, David Turner wrote:
> > 
> > 
> > What osd/mon/etc config settings do you have that are not default? It
> > might be worth utilizing nodown to stop osds from marking each other down
> > and finish the upgrade to be able to set the minimum osd version to mimic.
> > Stop the osds in a node, manually mark them down, start them back up in
> > mimic. Depending on how bad things are, setting pause on the cluster to
> > just finish the upgrade faster might not be a bad idea either.
> > 
> > This should be a simple question, have you confirmed that there are no
> > networking problems between the MONs while the elections are happening?
> > 
> > On Sat, Sep 8, 2018, 7:52 PM Kevin Hrpcek
> > <kevin.hrpcek@xxxxxxxxxxxxx<mailto:kevin.hrpcek@xxxxxxxxxxxxx><mailto:kevin.hrpcek@xxxxxxxxxxxxx><mailto:kevin.hrpcek@xxxxxxxxxxxxx><mailto:kevin.hrpcek@xxxxxxxxxxxxx><mailto:kevin.hrpcek@xxxxxxxxxxxxx><mailto:kevin.hrpcek@xxxxxxxxxxxxx><mailto:kevin.hrpcek@xxxxxxxxxxxxx>
> > <mailto:kevin.hrpcek@xxxxxxxxxxxxx><mailto:kevin.hrpcek@xxxxxxxxxxxxx><mailto:kevin.hrpcek@xxxxxxxxxxxxx><mailto:kevin.hrpcek@xxxxxxxxxxxxx><mailto:kevin.hrpcek@xxxxxxxxxxxxx><mailto:kevin.hrpcek@xxxxxxxxxxxxx><mailto:kevin.hrpcek@xxxxxxxxxxxxx><mailto:kevin.hrpcek@xxxxxxxxxxxxx>>
> > wrote:
> > 
> >      Hey Sage,
> > 
> >      I've posted the file with my email address for the user. It is
> >      with debug_mon 20/20, debug_paxos 20/20, and debug ms 1/5. The
> >      mons are calling for elections about every minute so I let this
> >      run for a few elections and saw this node become the leader a
> >      couple times. Debug logs start around 23:27:30. I had managed to
> >      get about 850/857 osds up, but it seems that within the last 30
> >      min it has all gone bad again due to the OSDs reporting each
> >      other as failed. We relaxed the osd_heartbeat_interval to 30 and
> >      osd_heartbeat_grace to 60 in an attempt to slow down how quickly
> >      OSDs are trying to fail each other. I'll put in the
> >      rocksdb_cache_size setting.
> > 
> >      Thanks for taking a look.
> > 
> >      Kevin
> > 
> >      On 09/08/2018 06:04 PM, Sage Weil wrote:
> > 
> > 
> >      Hi Kevin,
> > 
> >      I can't think of any major luminous->mimic changes off the top of my
> > head
> >      that would impact CPU usage, but it's always possible there is
> > something
> >      subtle.  Can you ceph-post-file a the full log from one of your mons
> >      (preferbably the leader)?
> > 
> >      You might try adjusting the rocksdb cache size.. try setting
> > 
> >        rocksdb_cache_size = 1342177280   # 10x the default, ~1.3 GB
> > 
> >      on the mons and restarting?
> > 
> >      Thanks!
> >      sage
> > 
> >      On Sat, 8 Sep 2018, Kevin Hrpcek wrote:
> > 
> > 
> > 
> >      Hello,
> > 
> >      I've had a Luminous -> Mimic upgrade go very poorly and my cluster
> > is stuck
> >      with almost all pgs down. One problem is that the mons have
> > started to
> >      re-elect a new quorum leader almost every minute. This is making
> > it difficult
> >      to monitor the cluster and even run any commands on it since at
> > least half the
> >      time a ceph command times out or takes over a minute to return
> > results. I've
> >      looked at the debug logs and it appears there is some timeout
> > occurring with
> >      paxos of about a minute. The msg_dispatch thread of the mons is
> > often running
> >      a core at 100% for about a minute(user time, no iowait). Running
> > strace on it
> >      shows the process is going through all of the mon db files (about
> > 6gb in
> >      store.db/*.sst). Does anyone have an idea of what this timeout is
> > or why my
> >      mons are always reelecting? One theory I have is that the
> > msg_dispatch can't
> >      process the SST's fast enough and hits some timeout for a health
> > check and the
> >      mon drops itself from the quorum since it thinks it isn't healthy.
> > I've been
> >      thinking of introducing a new mon to the cluster on hardware with
> > a better cpu
> >      to see if that can process the SSTs within this timeout.
> > 
> >      My cluster has the mons,mds,mgr and 30/41 osd servers on mimic,
> > and 11/41 osd
> >      servers on luminous. The original problem started when I restarted
> > the osds on
> >      one of the hosts. The cluster reacted poorly to them going down
> > and went into
> >      a frenzy of taking down other osds and remapping. I eventually got
> > that stable
> >      and the PGs were 90% good with the finish line in sight and then
> > the mons
> >      started their issue of releecting every minute. Now I can't keep
> > any decent
> >      amount of PGs up for more than a few hours. This started on
> > Wednesday.....
> > 
> >      Any help would be greatly appreciated.
> > 
> >      Thanks,
> >      Kevin
> > 
> >      --Debug snippet from a mon at reelection time
> >      2018-09-07 20:08:08.655 7f57b92cd700 20 mon.sephmon2@1(leader).mds
> > e14242
> >      maybe_resize_cluster in 1 max 1
> >      2018-09-07 20:08:08.655 7f57b92cd700  4 mon.sephmon2@1(leader).mds
> > e14242
> >      tick: resetting beacon timeouts due to mon delay (slow election?)
> > of 59.8106s
> >      seconds
> >      2018-09-07 20:08:08.655 7f57b92cd700 10
> >      mon.sephmon2@1(leader).paxosservice(mdsmap 13504..14242)
> > maybe_trim trim_to
> >      13742 would only trim 238 < paxos_service_trim_min 250
> >      2018-09-07 20:08:08.655 7f57b92cd700 10
> > mon.sephmon2@1(leader).auth v120657
> >      auth
> >      2018-09-07 20:08:08.655 7f57b92cd700 10
> > mon.sephmon2@1(leader).auth v120657
> >      check_rotate updated rotating
> >      2018-09-07 20:08:08.655 7f57b92cd700 10
> >      mon.sephmon2@1(leader).paxosservice(auth 120594..120657)
> > propose_pending
> >      2018-09-07 20:08:08.655 7f57b92cd700 10
> > mon.sephmon2@1(leader).auth v120657
> >      encode_pending v 120658
> >      2018-09-07 20:08:08.655 7f57b92cd700  5
> > mon.sephmon2@1(leader).paxos(paxos
> >      updating c 132917556..132918214) queue_pending_finisher
> > 0x55dce8e5b370
> >      2018-09-07 20:08:08.655 7f57b92cd700 10
> > mon.sephmon2@1(leader).paxos(paxos
> >      updating c 132917556..132918214) trigger_propose not active, will
> > propose
> >      later
> >      2018-09-07 20:08:08.655 7f57b92cd700  4 mon.sephmon2@1(leader).mgr
> > e2234 tick:
> >      resetting beacon timeouts due to mon delay (slow election?) of
> > 59.8844s
> >      seconds
> >      2018-09-07 20:08:08.655 7f57b92cd700 10
> >      mon.sephmon2@1(leader).paxosservice(mgr 1513..2234) maybe_trim
> > trim_to 1734
> >      would only trim 221 < paxos_service_trim_min 250
> >      2018-09-07 20:08:08.655 7f57b92cd700 10
> > mon.sephmon2@1(leader).health tick
> >      2018-09-07 20:08:08.655 7f57b92cd700 20
> > mon.sephmon2@1(leader).health
> >      check_member_health
> >      2018-09-07 20:08:08.657 7f57bcdd0700  1 --10.1.9.202:6789/0
> > <http://10.1.9.202:6789/0><http://10.1.9.202:6789/0><http://10.1.9.202:6789/0><http://10.1.9.202:6789/0><http://10.1.9.202:6789/0><http://10.1.9.202:6789/0><http://10.1.9.202:6789/0><http://10.1.9.202:6789/0>
> > 
> > 
> > -
> > 
> > 
> >      conn(0x55dcee55be00 :6789 s=STATE_ACCEPTING pgs=0 cs=0
> >      l=0)._process_connection sd=447 -
> >      2018-09-07 20:08:08.657 7f57bcdd0700 10 mon.sephmon2@1(leader) e17
> >      ms_verify_authorizer10.1.9.32:6823/4007
> > <http://10.1.9.32:6823/4007><http://10.1.9.32:6823/4007><http://10.1.9.32:6823/4007><http://10.1.9.32:6823/4007><http://10.1.9.32:6823/4007><http://10.1.9.32:6823/4007><http://10.1.9.32:6823/4007><http://10.1.9.32:6823/4007>
> > osd protocol 0
> >      2018-09-07 20:08:08.662 7f57b92cd700 10
> > mon.sephmon2@1(leader).health
> >      check_member_health avail 79% total 40 GiB, used 8.4 GiB, avail 32
> > GiB
> >      2018-09-07 20:08:08.662 7f57b92cd700 20
> > mon.sephmon2@1(leader).health
> >      check_leader_health
> >      2018-09-07 20:08:08.662 7f57b92cd700 10
> >      mon.sephmon2@1(leader).paxosservice(health 1534..1720) maybe_trim
> > trim_to 1715
> >      would only trim 181 < paxos_service_trim_min 250
> >      2018-09-07 20:08:08.662 7f57b92cd700 10
> > mon.sephmon2@1(leader).config tick
> >      2018-09-07 20:08:08.662 7f57b92cd700 20 mon.sephmon2@1(leader) e17
> >      sync_trim_providers
> >      2018-09-07 20:08:08.662 7f57b92cd700 -1 mon.sephmon2@1(leader) e17
> >      get_health_metrics reporting 1940 slow ops, oldest is
> > osd_failure(failed
> >      timeout osd.7210.1.9.9:6800/68904
> > <http://10.1.9.9:6800/68904><http://10.1.9.9:6800/68904><http://10.1.9.9:6800/68904><http://10.1.9.9:6800/68904><http://10.1.9.9:6800/68904><http://10.1.9.9:6800/68904><http://10.1.9.9:6800/68904><http://10.1.9.9:6800/68904>
> > for 317sec e987498 v987498)
> >      2018-09-07 20:08:08.662 7f57b92cd700  1
> > mon.sephmon2@1(leader).paxos(paxos
> >      updating c 132917556..132918214) accept timeout, calling fresh
> > election
> >      2018-09-07 20:08:08.662 7f57b92cd700 10 mon.sephmon2@1(leader) e17
> > bootstrap
> >      2018-09-07 20:08:08.662 7f57b92cd700 10 mon.sephmon2@1(leader) e17
> >      sync_reset_requester
> >      2018-09-07 20:08:08.662 7f57b92cd700 10 mon.sephmon2@1(leader) e17
> >      unregister_cluster_logger
> >      2018-09-07 20:08:08.662 7f57b92cd700 10 mon.sephmon2@1(leader) e17
> >      cancel_probe_timeout (none scheduled)
> >      2018-09-07 20:08:08.662 7f57b92cd700 10 mon.sephmon2@1(probing)
> > e17 _reset
> >      2018-09-07 20:08:08.662 7f57b92cd700 10 mon.sephmon2@1(probing)
> > e17
> >      cancel_probe_timeout (none scheduled)
> >      2018-09-07 20:08:08.662 7f57b92cd700 10 mon.sephmon2@1(probing)
> > e17
> >      timecheck_finish
> >      2018-09-07 20:08:08.662 7f57b92cd700 15 mon.sephmon2@1(probing)
> > e17
> >      health_tick_stop
> >      2018-09-07 20:08:08.662 7f57b92cd700 15 mon.sephmon2@1(probing)
> > e17
> >      health_interval_stop
> >      2018-09-07 20:08:08.662 7f57b92cd700 10 mon.sephmon2@1(probing)
> > e17
> >      scrub_event_cancel
> >      2018-09-07 20:08:08.662 7f57b92cd700 10 mon.sephmon2@1(probing)
> > e17
> >      scrub_reset
> >      2018-09-07 20:08:08.662 7f57b92cd700 10
> > mon.sephmon2@1(probing).paxos(paxos
> >      updating c 132917556..132918214) restart -- canceling timeouts
> >      2018-09-07 20:08:08.662 7f57b92cd700 10
> >      mon.sephmon2@1(probing).paxosservice(mdsmap 13504..14242) restart
> >      2018-09-07 20:08:08.662 7f57b92cd700 10
> >      mon.sephmon2@1(probing).paxosservice(osdmap 917471..987503)
> > restart
> > 
> > 
> > 
> > 
> > 
> >      _______________________________________________
> >      ceph-users mailing list
> >      ceph-users@xxxxxxxxxxxxxx<mailto:ceph-users@xxxxxxxxxxxxxx><mailto:ceph-users@xxxxxxxxxxxxxx><mailto:ceph-users@xxxxxxxxxxxxxx><mailto:ceph-users@xxxxxxxxxxxxxx><mailto:ceph-users@xxxxxxxxxxxxxx><mailto:ceph-users@xxxxxxxxxxxxxx><mailto:ceph-users@xxxxxxxxxxxxxx>
> > <mailto:ceph-users@xxxxxxxxxxxxxx><mailto:ceph-users@xxxxxxxxxxxxxx><mailto:ceph-users@xxxxxxxxxxxxxx><mailto:ceph-users@xxxxxxxxxxxxxx><mailto:ceph-users@xxxxxxxxxxxxxx><mailto:ceph-users@xxxxxxxxxxxxxx><mailto:ceph-users@xxxxxxxxxxxxxx><mailto:ceph-users@xxxxxxxxxxxxxx>
> >      http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> 
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com