Hi We got a pair of OSD located in node that crash randomly since 14.2.2 OS Version : Centos 7.6 There’re a ton of lines before crash , I will unespected: -- 3045> 2019-08-07 00:39:32.013 7fe9a4996700 1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7fe987e49700' had timed out after 15 -3044> 2019-08-07 00:39:32.013 7fe9a3994700 1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7fe987e49700' had timed out after 15 -3043> 2019-08-07 00:39:32.033 7fe9a4195700 1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7fe987e49700' had timed out after 15 -3042> 2019-08-07 00:39:32.033 7fe9a4996700 1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7fe987e49700' had timed out after 15 -- ----- Some hundred lines of: -164> 2019-08-07 00:47:36.628 7fe9a3994700 1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7fe98964c700' had timed out after 60 -163> 2019-08-07 00:47:36.632 7fe9a3994700 1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7fe98964c700' had timed out after 60 -162> 2019-08-07 00:47:36.632 7fe9a3994700 1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7fe98964c700' had timed out after 60 ----- -78> 2019-08-07 00:50:51.755 7fe995bfa700 10 monclient: tick -77> 2019-08-07 00:50:51.755 7fe995bfa700 10 monclient: _check_auth_rotating have uptodate secrets (they expire after 2019-08-07 00:50:21.756453) -76> 2019-08-07 00:51:01.755 7fe995bfa700 10 monclient: tick -75> 2019-08-07 00:51:01.755 7fe995bfa700 10 monclient: _check_auth_rotating have uptodate secrets (they expire after 2019-08-07 00:50:31.756604) -74> 2019-08-07 00:51:11.755 7fe995bfa700 10 monclient: tick -73> 2019-08-07 00:51:11.755 7fe995bfa700 10 monclient: _check_auth_rotating have uptodate secrets (they expire after 2019-08-07 00:50:41.756788) -72> 2019-08-07 00:51:21.756 7fe995bfa700 10 monclient: tick -71> 2019-08-07 00:51:21.756 7fe995bfa700 10 monclient: _check_auth_rotating have uptodate secrets (they expire after 2019-08-07 00:50:51.756982) -70> 2019-08-07 00:51:31.755 7fe995bfa700 10 monclient: tick -69> 2019-08-07 00:51:31.755 7fe995bfa700 10 monclient: _check_auth_rotating have uptodate secrets (they expire after 2019-08-07 00:51:01.757206) -68> 2019-08-07 00:51:41.756 7fe995bfa700 10 monclient: tick -67> 2019-08-07 00:51:41.756 7fe995bfa700 10 monclient: _check_auth_rotating have uptodate secrets (they expire after 2019-08-07 00:51:11.757364) -66> 2019-08-07 00:51:51.756 7fe995bfa700 10 monclient: tick -65> 2019-08-07 00:51:51.756 7fe995bfa700 10 monclient: _check_auth_rotating have uptodate secrets (they expire after 2019-08-07 00:51:21.757535) -64> 2019-08-07 00:51:52.861 7fe987e49700 1 heartbeat_map clear_timeout 'OSD::osd_op_tp thread 0x7fe987e49700' had timed out after 15 -63> 2019-08-07 00:51:52.861 7fe987e49700 1 heartbeat_map clear_timeout 'OSD::osd_op_tp thread 0x7fe987e49700' had suicide timed out after 150 -62> 2019-08-07 00:51:52.948 7fe99966c700 5 bluestore.MempoolThread(0x55ff04ad6a88) _tune_cache_size target: 4294967296 heap: 6018998272 unmapped: 1721180160 mapped: 4297818112 old cache_size: 1994018210 new cache size: 1992784572 -61> 2019-08-07 00:51:52.948 7fe99966c700 5 bluestore.MempoolThread(0x55ff04ad6a88) _trim_shards cache_size: 1992784572 kv_alloc: 763363328 kv_used: 749381098 meta_alloc: 763363328 meta_used: 654593191 data_alloc: 452984832 data_used: 455929856 -60> 2019-08-07 00:51:57.923 7fe99966c700 5 bluestore.MempoolThread(0x55ff04ad6a88) _trim_shards cache_size: 1994110827 kv_alloc: 763363328 kv_used: 749381098 meta_alloc: 763363328 meta_used: 654590799 data_alloc: 452984832 data_used: 451538944 -59> 2019-08-07 00:51:57.973 7fe99966c700 5 bluestore.MempoolThread(0x55ff04ad6a88) _tune_cache_size target: 4294967296 heap: 6018998272 unmapped: 1725702144 mapped: 4293296128 old cache_size: 1994110827 new cache size: 1994442069 -58> 2019-08-07 00:52:01.756 7fe995bfa700 10 monclient: tick -57> 2019-08-07 00:52:01.756 7fe995bfa700 10 monclient: _check_auth_rotating have uptodate secrets (they expire after 2019-08-07 00:51:31.757684) -56> 2019-08-07 00:52:02.933 7fe99966c700 5 bluestore.MempoolThread(0x55ff04ad6a88) _trim_shards cache_size: 1995765747 kv_alloc: 763363328 kv_used: 749381098 meta_alloc: 763363328 meta_used: 654590799 data_alloc: 452984832 data_used: 451538944 -55> 2019-08-07 00:52:02.983 7fe99966c700 5 bluestore.MempoolThread(0x55ff04ad6a88) _tune_cache_size target: 4294967296 heap: 6018998272 unmapped: 1725702144 mapped: 4293296128 old cache_size: 1995765747 new cache size: 1996096345 -54> 2019-08-07 00:52:07.943 7fe99966c700 5 bluestore.MempoolThread(0x55ff04ad6a88) _trim_shards cache_size: 1997417449 kv_alloc: 763363328 kv_used: 749381098 meta_alloc: 763363328 meta_used: 654590799 data_alloc: 452984832 data_used: 451538944 -53> 2019-08-07 00:52:07.993 7fe99966c700 5 bluestore.MempoolThread(0x55ff04ad6a88) _tune_cache_size target: 4294967296 heap: 6018998272 unmapped: 1725702144 mapped: 4293296128 old cache_size: 1997417449 new cache size: 1997747404 -52> 2019-08-07 00:52:11.757 7fe995bfa700 10 monclient: tick -51> 2019-08-07 00:52:11.757 7fe995bfa700 10 monclient: _check_auth_rotating have uptodate secrets (they expire after 2019-08-07 00:51:41.757855) -50> 2019-08-07 00:52:12.952 7fe99966c700 5 bluestore.MempoolThread(0x55ff04ad6a88) _trim_shards cache_size: 1999065941 kv_alloc: 763363328 kv_used: 749381098 meta_alloc: 763363328 meta_used: 654590799 data_alloc: 452984832 data_used: 451538944 -49> 2019-08-07 00:52:13.002 7fe99966c700 5 bluestore.MempoolThread(0x55ff04ad6a88) _tune_cache_size target: 4294967296 heap: 6018998272 unmapped: 1725702144 mapped: 4293296128 old cache_size: 1999065941 new cache size: 1999395254 -48> 2019-08-07 00:52:17.962 7fe99966c700 5 bluestore.MempoolThread(0x55ff04ad6a88) _trim_shards cache_size: 2000711226 kv_alloc: 771751936 kv_used: 749381098 meta_alloc: 771751936 meta_used: 654590799 data_alloc: 452984832 data_used: 451538944 -47> 2019-08-07 00:52:18.012 7fe99966c700 5 bluestore.MempoolThread(0x55ff04ad6a88) _tune_cache_size target: 4294967296 heap: 6018998272 unmapped: 1725702144 mapped: 4293296128 old cache_size: 2000711226 new cache size: 2001039899 -46> 2019-08-07 00:52:21.756 7fe995bfa700 10 monclient: tick -45> 2019-08-07 00:52:21.756 7fe995bfa700 10 monclient: _check_auth_rotating have uptodate secrets (they expire after 2019-08-07 00:51:51.758043) -44> 2019-08-07 00:52:22.971 7fe99966c700 5 bluestore.MempoolThread(0x55ff04ad6a88) _trim_shards cache_size: 2002353314 kv_alloc: 771751936 kv_used: 749381098 meta_alloc: 771751936 meta_used: 654590799 data_alloc: 452984832 data_used: 451538944 -43> 2019-08-07 00:52:23.022 7fe99966c700 5 bluestore.MempoolThread(0x55ff04ad6a88) _tune_cache_size target: 4294967296 heap: 6018998272 unmapped: 1725702144 mapped: 4293296128 old cache_size: 2002353314 new cache size: 2002681348 -42> 2019-08-07 00:52:27.982 7fe99966c700 5 bluestore.MempoolThread(0x55ff04ad6a88) _trim_shards cache_size: 2003992210 kv_alloc: 771751936 kv_used: 749381098 meta_alloc: 771751936 meta_used: 654590799 data_alloc: 452984832 data_used: 451538944 -41> 2019-08-07 00:52:28.031 7fe99966c700 5 bluestore.MempoolThread(0x55ff04ad6a88) _tune_cache_size target: 4294967296 heap: 6018998272 unmapped: 1725702144 mapped: 4293296128 old cache_size: 2003992210 new cache size: 2004319607 -40> 2019-08-07 00:52:31.756 7fe995bfa700 10 monclient: tick -39> 2019-08-07 00:52:31.756 7fe995bfa700 10 monclient: _check_auth_rotating have uptodate secrets (they expire after 2019-08-07 00:52:01.758219) -38> 2019-08-07 00:52:32.991 7fe99966c700 5 bluestore.MempoolThread(0x55ff04ad6a88) _trim_shards cache_size: 2005627920 kv_alloc: 771751936 kv_used: 749381098 meta_alloc: 771751936 meta_used: 654590799 data_alloc: 452984832 data_used: 451538944 -37> 2019-08-07 00:52:33.041 7fe99966c700 5 bluestore.MempoolThread(0x55ff04ad6a88) _tune_cache_size target: 4294967296 heap: 6018998272 unmapped: 1725702144 mapped: 4293296128 old cache_size: 2005627920 new cache size: 2005954680 -36> 2019-08-07 00:52:38.001 7fe99966c700 5 bluestore.MempoolThread(0x55ff04ad6a88) _trim_shards cache_size: 2007260450 kv_alloc: 771751936 kv_used: 749381098 meta_alloc: 771751936 meta_used: 654590799 data_alloc: 452984832 data_used: 451538944 -35> 2019-08-07 00:52:38.051 7fe99966c700 5 bluestore.MempoolThread(0x55ff04ad6a88) _tune_cache_size target: 4294967296 heap: 6018998272 unmapped: 1725702144 mapped: 4293296128 old cache_size: 2007260450 new cache size: 2007586575 -34> 2019-08-07 00:52:41.757 7fe995bfa700 10 monclient: tick -33> 2019-08-07 00:52:41.757 7fe995bfa700 10 monclient: _check_auth_rotating have uptodate secrets (they expire after 2019-08-07 00:52:11.758447) -32> 2019-08-07 00:52:43.011 7fe99966c700 5 bluestore.MempoolThread(0x55ff04ad6a88) _trim_shards cache_size: 2008889806 kv_alloc: 771751936 kv_used: 749381098 meta_alloc: 771751936 meta_used: 654590799 data_alloc: 452984832 data_used: 451538944 -31> 2019-08-07 00:52:43.061 7fe99966c700 5 bluestore.MempoolThread(0x55ff04ad6a88) _tune_cache_size target: 4294967296 heap: 6018998272 unmapped: 1725702144 mapped: 4293296128 old cache_size: 2008889806 new cache size: 2009215297 -30> 2019-08-07 00:52:48.021 7fe99966c700 5 bluestore.MempoolThread(0x55ff04ad6a88) _trim_shards cache_size: 2010515995 kv_alloc: 771751936 kv_used: 749381098 meta_alloc: 771751936 meta_used: 654590799 data_alloc: 452984832 data_used: 451538944 -29> 2019-08-07 00:52:48.071 7fe99966c700 5 bluestore.MempoolThread(0x55ff04ad6a88) _tune_cache_size target: 4294967296 heap: 6018998272 unmapped: 1725702144 mapped: 4293296128 old cache_size: 2010515995 new cache size: 2010840853 -28> 2019-08-07 00:52:51.757 7fe995bfa700 10 monclient: tick -27> 2019-08-07 00:52:51.757 7fe995bfa700 10 monclient: _check_auth_rotating have uptodate secrets (they expire after 2019-08-07 00:52:21.758631) -26> 2019-08-07 00:52:53.031 7fe99966c700 5 bluestore.MempoolThread(0x55ff04ad6a88) _trim_shards cache_size: 2012139023 kv_alloc: 771751936 kv_used: 749381098 meta_alloc: 771751936 meta_used: 654590799 data_alloc: 452984832 data_used: 451538944 -25> 2019-08-07 00:52:53.081 7fe99966c700 5 bluestore.MempoolThread(0x55ff04ad6a88) _tune_cache_size target: 4294967296 heap: 6018998272 unmapped: 1725702144 mapped: 4293296128 old cache_size: 2012139023 new cache size: 2012463250 -24> 2019-08-07 00:52:58.042 7fe99966c700 5 bluestore.MempoolThread(0x55ff04ad6a88) _trim_shards cache_size: 2013758896 kv_alloc: 771751936 kv_used: 749381098 meta_alloc: 771751936 meta_used: 654590799 data_alloc: 452984832 data_used: 451538944 -23> 2019-08-07 00:52:58.092 7fe99966c700 5 bluestore.MempoolThread(0x55ff04ad6a88) _tune_cache_size target: 4294967296 heap: 6018998272 unmapped: 1725702144 mapped: 4293296128 old cache_size: 2013758896 new cache size: 2014082492 -22> 2019-08-07 00:53:01.758 7fe995bfa700 10 monclient: tick -21> 2019-08-07 00:53:01.758 7fe995bfa700 10 monclient: _check_auth_rotating have uptodate secrets (they expire after 2019-08-07 00:52:31.758799) -20> 2019-08-07 00:53:03.052 7fe99966c700 5 bluestore.MempoolThread(0x55ff04ad6a88) _trim_shards cache_size: 2015375620 kv_alloc: 771751936 kv_used: 749381098 meta_alloc: 771751936 meta_used: 654590799 data_alloc: 452984832 data_used: 451538944 -19> 2019-08-07 00:53:03.102 7fe99966c700 5 bluestore.MempoolThread(0x55ff04ad6a88) _tune_cache_size target: 4294967296 heap: 6018998272 unmapped: 1725702144 mapped: 4293296128 old cache_size: 2015375620 new cache size: 2015698587 -18> 2019-08-07 00:53:08.062 7fe99966c700 5 bluestore.MempoolThread(0x55ff04ad6a88) _trim_shards cache_size: 2016989201 kv_alloc: 771751936 kv_used: 749381098 meta_alloc: 771751936 meta_used: 654590799 data_alloc: 452984832 data_used: 451538944 -17> 2019-08-07 00:53:08.112 7fe99966c700 5 bluestore.MempoolThread(0x55ff04ad6a88) _tune_cache_size target: 4294967296 heap: 6018998272 unmapped: 1725702144 mapped: 4293296128 old cache_size: 2016989201 new cache size: 2017311541 -16> 2019-08-07 00:53:11.758 7fe995bfa700 10 monclient: tick -15> 2019-08-07 00:53:11.758 7fe995bfa700 10 monclient: _check_auth_rotating have uptodate secrets (they expire after 2019-08-07 00:52:41.759013) -14> 2019-08-07 00:53:13.071 7fe99966c700 5 bluestore.MempoolThread(0x55ff04ad6a88) _trim_shards cache_size: 2018599645 kv_alloc: 771751936 kv_used: 749381098 meta_alloc: 771751936 meta_used: 654590799 data_alloc: 452984832 data_used: 451538944 -13> 2019-08-07 00:53:13.121 7fe99966c700 5 bluestore.MempoolThread(0x55ff04ad6a88) _tune_cache_size target: 4294967296 heap: 6018998272 unmapped: 1725702144 mapped: 4293296128 old cache_size: 2018599645 new cache size: 2018921358 -12> 2019-08-07 00:53:18.081 7fe99966c700 5 bluestore.MempoolThread(0x55ff04ad6a88) _trim_shards cache_size: 2020206960 kv_alloc: 771751936 kv_used: 749381098 meta_alloc: 771751936 meta_used: 654590799 data_alloc: 452984832 data_used: 451538944 -11> 2019-08-07 00:53:18.130 7fe99966c700 5 bluestore.MempoolThread(0x55ff04ad6a88) _tune_cache_size target: 4294967296 heap: 6018998272 unmapped: 1725702144 mapped: 4293296128 old cache_size: 2020206960 new cache size: 2020528048 -10> 2019-08-07 00:53:21.757 7fe995bfa700 10 monclient: tick -9> 2019-08-07 00:53:21.757 7fe995bfa700 10 monclient: _check_auth_rotating have uptodate secrets (they expire after 2019-08-07 00:52:51.759214) -8> 2019-08-07 00:53:23.090 7fe99966c700 5 bluestore.MempoolThread(0x55ff04ad6a88) _trim_shards cache_size: 2021811150 kv_alloc: 780140544 kv_used: 749381098 meta_alloc: 780140544 meta_used: 654590799 data_alloc: 461373440 data_used: 451538944 -7> 2019-08-07 00:53:23.140 7fe99966c700 5 bluestore.MempoolThread(0x55ff04ad6a88) _tune_cache_size target: 4294967296 heap: 6018998272 unmapped: 1725702144 mapped: 4293296128 old cache_size: 2021811150 new cache size: 2022131613 -6> 2019-08-07 00:53:28.100 7fe99966c700 5 bluestore.MempoolThread(0x55ff04ad6a88) _trim_shards cache_size: 2023412220 kv_alloc: 780140544 kv_used: 749381098 meta_alloc: 780140544 meta_used: 654590799 data_alloc: 461373440 data_used: 451538944 -5> 2019-08-07 00:53:28.150 7fe99966c700 5 bluestore.MempoolThread(0x55ff04ad6a88) _tune_cache_size target: 4294967296 heap: 6018998272 unmapped: 1725702144 mapped: 4293296128 old cache_size: 2023412220 new cache size: 2023732060 -4> 2019-08-07 00:53:31.758 7fe995bfa700 10 monclient: tick -3> 2019-08-07 00:53:31.758 7fe995bfa700 10 monclient: _check_auth_rotating have uptodate secrets (they expire after 2019-08-07 00:53:01.759334) -2> 2019-08-07 00:53:33.110 7fe99966c700 5 bluestore.MempoolThread(0x55ff04ad6a88) _trim_shards cache_size: 2025010178 kv_alloc: 780140544 kv_used: 749381098 meta_alloc: 780140544 meta_used: 654590799 data_alloc: 461373440 data_used: 451538944 -1> 2019-08-07 00:53:33.160 7fe99966c700 5 bluestore.MempoolThread(0x55ff04ad6a88) _tune_cache_size target: 4294967296 heap: 6018998272 unmapped: 1725702144 mapped: 4293296128 old cache_size: 2025010178 new cache size: 2025329397 0> 2019-08-07 00:53:37.655 7fe987e49700 -1 *** Caught signal (Aborted) ** in thread 7fe987e49700 thread_name:tp_osd_tp ceph version 14.2.2 (4f8fa0a0024755aae7d95567c63f11d6862d55be) nautilus (stable) 1: (()+0xf5d0) [0x7fe9a7cba5d0] 2: (pthread_kill()+0x31) [0x7fe9a7cb79d1] 3: (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d const*, char const*, unsigned long)+0x466) [0x55fef8748176] 4: (ceph::HeartbeatMap::clear_timeout(ceph::heartbeat_handle_d*)+0x7b) [0x55fef874878b] 5: (BlueStore::queue_transactions(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Transaction> >&, boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0xa9e) [0x55fef86085de] 6: (ObjectStore::queue_transaction(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ObjectStore::Transaction&&, boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x7f) [0x55fef81cd7ff] 7: (OSD::dispatch_context_transaction(PG::RecoveryCtx&, PG*, ThreadPool::TPHandle*)+0x58) [0x55fef8118298] 8: (OSD::dequeue_peering_evt(OSDShard*, PG*, std::shared_ptr<PGPeeringEvent>, ThreadPool::TPHandle&)+0x202) [0x55fef81767c2] 9: (PGPeeringItem::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x50) [0x55fef83eb490] 10: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x9f4) [0x55fef816aef4] 11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x433) [0x55fef8769ce3] 12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55fef876cd80] 13: (()+0x7dd5) [0x7fe9a7cb2dd5] 14: (clone()+0x6d) [0x7fe9a6b7202d] About server load: [root@CEPH008 ~]# top top - 00:57:30 up 186 days, 22 min, 1 user, load average: 11.65, 13.42, 13.51 Tasks: 316 total, 1 running, 315 sleeping, 0 stopped, 0 zombie %Cpu(s): 2.3 us, 1.2 sy, 0.0 ni, 74.1 id, 22.4 wa, 0.0 hi, 0.1 si, 0.0 st KiB Mem : 65737480 total, 431824 free, 49046608 used, 16259048 buff/cache KiB Swap: 29241340 total, 19406504 free, 9834836 used. 15917556 avail Mem Currently the server is doing some deep-scrub that we got off during the last two weeks due a node evict and a new node install. |
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com