14.2.2 - OSD Crash

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi

 

We got a pair of OSD located in  node that crash randomly since 14.2.2

 

OS Version : Centos 7.6

 

There’re a ton of lines before crash , I will unespected:

 

--

3045> 2019-08-07 00:39:32.013 7fe9a4996700  1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7fe987e49700' had timed out after 15

-3044> 2019-08-07 00:39:32.013 7fe9a3994700  1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7fe987e49700' had timed out after 15

-3043> 2019-08-07 00:39:32.033 7fe9a4195700  1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7fe987e49700' had timed out after 15

-3042> 2019-08-07 00:39:32.033 7fe9a4996700  1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7fe987e49700' had timed out after 15

--

-----

 

Some hundred lines of:

-164> 2019-08-07 00:47:36.628 7fe9a3994700  1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7fe98964c700' had timed out after 60

  -163> 2019-08-07 00:47:36.632 7fe9a3994700  1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7fe98964c700' had timed out after 60

  -162> 2019-08-07 00:47:36.632 7fe9a3994700  1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7fe98964c700' had timed out after 60

-----

 

   -78> 2019-08-07 00:50:51.755 7fe995bfa700 10 monclient: tick

   -77> 2019-08-07 00:50:51.755 7fe995bfa700 10 monclient: _check_auth_rotating have uptodate secrets (they expire after 2019-08-07 00:50:21.756453)

   -76> 2019-08-07 00:51:01.755 7fe995bfa700 10 monclient: tick

   -75> 2019-08-07 00:51:01.755 7fe995bfa700 10 monclient: _check_auth_rotating have uptodate secrets (they expire after 2019-08-07 00:50:31.756604)

   -74> 2019-08-07 00:51:11.755 7fe995bfa700 10 monclient: tick

   -73> 2019-08-07 00:51:11.755 7fe995bfa700 10 monclient: _check_auth_rotating have uptodate secrets (they expire after 2019-08-07 00:50:41.756788)

   -72> 2019-08-07 00:51:21.756 7fe995bfa700 10 monclient: tick

   -71> 2019-08-07 00:51:21.756 7fe995bfa700 10 monclient: _check_auth_rotating have uptodate secrets (they expire after 2019-08-07 00:50:51.756982)

   -70> 2019-08-07 00:51:31.755 7fe995bfa700 10 monclient: tick

   -69> 2019-08-07 00:51:31.755 7fe995bfa700 10 monclient: _check_auth_rotating have uptodate secrets (they expire after 2019-08-07 00:51:01.757206)

   -68> 2019-08-07 00:51:41.756 7fe995bfa700 10 monclient: tick

   -67> 2019-08-07 00:51:41.756 7fe995bfa700 10 monclient: _check_auth_rotating have uptodate secrets (they expire after 2019-08-07 00:51:11.757364)

   -66> 2019-08-07 00:51:51.756 7fe995bfa700 10 monclient: tick

   -65> 2019-08-07 00:51:51.756 7fe995bfa700 10 monclient: _check_auth_rotating have uptodate secrets (they expire after 2019-08-07 00:51:21.757535)

   -64> 2019-08-07 00:51:52.861 7fe987e49700  1 heartbeat_map clear_timeout 'OSD::osd_op_tp thread 0x7fe987e49700' had timed out after 15

   -63> 2019-08-07 00:51:52.861 7fe987e49700  1 heartbeat_map clear_timeout 'OSD::osd_op_tp thread 0x7fe987e49700' had suicide timed out after 150

   -62> 2019-08-07 00:51:52.948 7fe99966c700  5 bluestore.MempoolThread(0x55ff04ad6a88) _tune_cache_size target: 4294967296 heap: 6018998272 unmapped: 1721180160 mapped: 4297818112 old cache_size: 1994018210 new cache size: 1992784572

   -61> 2019-08-07 00:51:52.948 7fe99966c700  5 bluestore.MempoolThread(0x55ff04ad6a88) _trim_shards cache_size: 1992784572 kv_alloc: 763363328 kv_used: 749381098 meta_alloc: 763363328 meta_used: 654593191 data_alloc: 452984832 data_used: 455929856

   -60> 2019-08-07 00:51:57.923 7fe99966c700  5 bluestore.MempoolThread(0x55ff04ad6a88) _trim_shards cache_size: 1994110827 kv_alloc: 763363328 kv_used: 749381098 meta_alloc: 763363328 meta_used: 654590799 data_alloc: 452984832 data_used: 451538944

   -59> 2019-08-07 00:51:57.973 7fe99966c700  5 bluestore.MempoolThread(0x55ff04ad6a88) _tune_cache_size target: 4294967296 heap: 6018998272 unmapped: 1725702144 mapped: 4293296128 old cache_size: 1994110827 new cache size: 1994442069

   -58> 2019-08-07 00:52:01.756 7fe995bfa700 10 monclient: tick

   -57> 2019-08-07 00:52:01.756 7fe995bfa700 10 monclient: _check_auth_rotating have uptodate secrets (they expire after 2019-08-07 00:51:31.757684)

   -56> 2019-08-07 00:52:02.933 7fe99966c700  5 bluestore.MempoolThread(0x55ff04ad6a88) _trim_shards cache_size: 1995765747 kv_alloc: 763363328 kv_used: 749381098 meta_alloc: 763363328 meta_used: 654590799 data_alloc: 452984832 data_used: 451538944

   -55> 2019-08-07 00:52:02.983 7fe99966c700  5 bluestore.MempoolThread(0x55ff04ad6a88) _tune_cache_size target: 4294967296 heap: 6018998272 unmapped: 1725702144 mapped: 4293296128 old cache_size: 1995765747 new cache size: 1996096345

   -54> 2019-08-07 00:52:07.943 7fe99966c700  5 bluestore.MempoolThread(0x55ff04ad6a88) _trim_shards cache_size: 1997417449 kv_alloc: 763363328 kv_used: 749381098 meta_alloc: 763363328 meta_used: 654590799 data_alloc: 452984832 data_used: 451538944

   -53> 2019-08-07 00:52:07.993 7fe99966c700  5 bluestore.MempoolThread(0x55ff04ad6a88) _tune_cache_size target: 4294967296 heap: 6018998272 unmapped: 1725702144 mapped: 4293296128 old cache_size: 1997417449 new cache size: 1997747404

   -52> 2019-08-07 00:52:11.757 7fe995bfa700 10 monclient: tick

   -51> 2019-08-07 00:52:11.757 7fe995bfa700 10 monclient: _check_auth_rotating have uptodate secrets (they expire after 2019-08-07 00:51:41.757855)

   -50> 2019-08-07 00:52:12.952 7fe99966c700  5 bluestore.MempoolThread(0x55ff04ad6a88) _trim_shards cache_size: 1999065941 kv_alloc: 763363328 kv_used: 749381098 meta_alloc: 763363328 meta_used: 654590799 data_alloc: 452984832 data_used: 451538944

   -49> 2019-08-07 00:52:13.002 7fe99966c700  5 bluestore.MempoolThread(0x55ff04ad6a88) _tune_cache_size target: 4294967296 heap: 6018998272 unmapped: 1725702144 mapped: 4293296128 old cache_size: 1999065941 new cache size: 1999395254

   -48> 2019-08-07 00:52:17.962 7fe99966c700  5 bluestore.MempoolThread(0x55ff04ad6a88) _trim_shards cache_size: 2000711226 kv_alloc: 771751936 kv_used: 749381098 meta_alloc: 771751936 meta_used: 654590799 data_alloc: 452984832 data_used: 451538944

   -47> 2019-08-07 00:52:18.012 7fe99966c700  5 bluestore.MempoolThread(0x55ff04ad6a88) _tune_cache_size target: 4294967296 heap: 6018998272 unmapped: 1725702144 mapped: 4293296128 old cache_size: 2000711226 new cache size: 2001039899

   -46> 2019-08-07 00:52:21.756 7fe995bfa700 10 monclient: tick

   -45> 2019-08-07 00:52:21.756 7fe995bfa700 10 monclient: _check_auth_rotating have uptodate secrets (they expire after 2019-08-07 00:51:51.758043)

   -44> 2019-08-07 00:52:22.971 7fe99966c700  5 bluestore.MempoolThread(0x55ff04ad6a88) _trim_shards cache_size: 2002353314 kv_alloc: 771751936 kv_used: 749381098 meta_alloc: 771751936 meta_used: 654590799 data_alloc: 452984832 data_used: 451538944

   -43> 2019-08-07 00:52:23.022 7fe99966c700  5 bluestore.MempoolThread(0x55ff04ad6a88) _tune_cache_size target: 4294967296 heap: 6018998272 unmapped: 1725702144 mapped: 4293296128 old cache_size: 2002353314 new cache size: 2002681348

   -42> 2019-08-07 00:52:27.982 7fe99966c700  5 bluestore.MempoolThread(0x55ff04ad6a88) _trim_shards cache_size: 2003992210 kv_alloc: 771751936 kv_used: 749381098 meta_alloc: 771751936 meta_used: 654590799 data_alloc: 452984832 data_used: 451538944

   -41> 2019-08-07 00:52:28.031 7fe99966c700  5 bluestore.MempoolThread(0x55ff04ad6a88) _tune_cache_size target: 4294967296 heap: 6018998272 unmapped: 1725702144 mapped: 4293296128 old cache_size: 2003992210 new cache size: 2004319607

   -40> 2019-08-07 00:52:31.756 7fe995bfa700 10 monclient: tick

   -39> 2019-08-07 00:52:31.756 7fe995bfa700 10 monclient: _check_auth_rotating have uptodate secrets (they expire after 2019-08-07 00:52:01.758219)

   -38> 2019-08-07 00:52:32.991 7fe99966c700  5 bluestore.MempoolThread(0x55ff04ad6a88) _trim_shards cache_size: 2005627920 kv_alloc: 771751936 kv_used: 749381098 meta_alloc: 771751936 meta_used: 654590799 data_alloc: 452984832 data_used: 451538944

   -37> 2019-08-07 00:52:33.041 7fe99966c700  5 bluestore.MempoolThread(0x55ff04ad6a88) _tune_cache_size target: 4294967296 heap: 6018998272 unmapped: 1725702144 mapped: 4293296128 old cache_size: 2005627920 new cache size: 2005954680

   -36> 2019-08-07 00:52:38.001 7fe99966c700  5 bluestore.MempoolThread(0x55ff04ad6a88) _trim_shards cache_size: 2007260450 kv_alloc: 771751936 kv_used: 749381098 meta_alloc: 771751936 meta_used: 654590799 data_alloc: 452984832 data_used: 451538944

   -35> 2019-08-07 00:52:38.051 7fe99966c700  5 bluestore.MempoolThread(0x55ff04ad6a88) _tune_cache_size target: 4294967296 heap: 6018998272 unmapped: 1725702144 mapped: 4293296128 old cache_size: 2007260450 new cache size: 2007586575

   -34> 2019-08-07 00:52:41.757 7fe995bfa700 10 monclient: tick

   -33> 2019-08-07 00:52:41.757 7fe995bfa700 10 monclient: _check_auth_rotating have uptodate secrets (they expire after 2019-08-07 00:52:11.758447)

   -32> 2019-08-07 00:52:43.011 7fe99966c700  5 bluestore.MempoolThread(0x55ff04ad6a88) _trim_shards cache_size: 2008889806 kv_alloc: 771751936 kv_used: 749381098 meta_alloc: 771751936 meta_used: 654590799 data_alloc: 452984832 data_used: 451538944

   -31> 2019-08-07 00:52:43.061 7fe99966c700  5 bluestore.MempoolThread(0x55ff04ad6a88) _tune_cache_size target: 4294967296 heap: 6018998272 unmapped: 1725702144 mapped: 4293296128 old cache_size: 2008889806 new cache size: 2009215297

   -30> 2019-08-07 00:52:48.021 7fe99966c700  5 bluestore.MempoolThread(0x55ff04ad6a88) _trim_shards cache_size: 2010515995 kv_alloc: 771751936 kv_used: 749381098 meta_alloc: 771751936 meta_used: 654590799 data_alloc: 452984832 data_used: 451538944

   -29> 2019-08-07 00:52:48.071 7fe99966c700  5 bluestore.MempoolThread(0x55ff04ad6a88) _tune_cache_size target: 4294967296 heap: 6018998272 unmapped: 1725702144 mapped: 4293296128 old cache_size: 2010515995 new cache size: 2010840853

   -28> 2019-08-07 00:52:51.757 7fe995bfa700 10 monclient: tick

   -27> 2019-08-07 00:52:51.757 7fe995bfa700 10 monclient: _check_auth_rotating have uptodate secrets (they expire after 2019-08-07 00:52:21.758631)

   -26> 2019-08-07 00:52:53.031 7fe99966c700  5 bluestore.MempoolThread(0x55ff04ad6a88) _trim_shards cache_size: 2012139023 kv_alloc: 771751936 kv_used: 749381098 meta_alloc: 771751936 meta_used: 654590799 data_alloc: 452984832 data_used: 451538944

   -25> 2019-08-07 00:52:53.081 7fe99966c700  5 bluestore.MempoolThread(0x55ff04ad6a88) _tune_cache_size target: 4294967296 heap: 6018998272 unmapped: 1725702144 mapped: 4293296128 old cache_size: 2012139023 new cache size: 2012463250

   -24> 2019-08-07 00:52:58.042 7fe99966c700  5 bluestore.MempoolThread(0x55ff04ad6a88) _trim_shards cache_size: 2013758896 kv_alloc: 771751936 kv_used: 749381098 meta_alloc: 771751936 meta_used: 654590799 data_alloc: 452984832 data_used: 451538944

   -23> 2019-08-07 00:52:58.092 7fe99966c700  5 bluestore.MempoolThread(0x55ff04ad6a88) _tune_cache_size target: 4294967296 heap: 6018998272 unmapped: 1725702144 mapped: 4293296128 old cache_size: 2013758896 new cache size: 2014082492

   -22> 2019-08-07 00:53:01.758 7fe995bfa700 10 monclient: tick

   -21> 2019-08-07 00:53:01.758 7fe995bfa700 10 monclient: _check_auth_rotating have uptodate secrets (they expire after 2019-08-07 00:52:31.758799)

   -20> 2019-08-07 00:53:03.052 7fe99966c700  5 bluestore.MempoolThread(0x55ff04ad6a88) _trim_shards cache_size: 2015375620 kv_alloc: 771751936 kv_used: 749381098 meta_alloc: 771751936 meta_used: 654590799 data_alloc: 452984832 data_used: 451538944

   -19> 2019-08-07 00:53:03.102 7fe99966c700  5 bluestore.MempoolThread(0x55ff04ad6a88) _tune_cache_size target: 4294967296 heap: 6018998272 unmapped: 1725702144 mapped: 4293296128 old cache_size: 2015375620 new cache size: 2015698587

   -18> 2019-08-07 00:53:08.062 7fe99966c700  5 bluestore.MempoolThread(0x55ff04ad6a88) _trim_shards cache_size: 2016989201 kv_alloc: 771751936 kv_used: 749381098 meta_alloc: 771751936 meta_used: 654590799 data_alloc: 452984832 data_used: 451538944

   -17> 2019-08-07 00:53:08.112 7fe99966c700  5 bluestore.MempoolThread(0x55ff04ad6a88) _tune_cache_size target: 4294967296 heap: 6018998272 unmapped: 1725702144 mapped: 4293296128 old cache_size: 2016989201 new cache size: 2017311541

   -16> 2019-08-07 00:53:11.758 7fe995bfa700 10 monclient: tick

   -15> 2019-08-07 00:53:11.758 7fe995bfa700 10 monclient: _check_auth_rotating have uptodate secrets (they expire after 2019-08-07 00:52:41.759013)

   -14> 2019-08-07 00:53:13.071 7fe99966c700  5 bluestore.MempoolThread(0x55ff04ad6a88) _trim_shards cache_size: 2018599645 kv_alloc: 771751936 kv_used: 749381098 meta_alloc: 771751936 meta_used: 654590799 data_alloc: 452984832 data_used: 451538944

   -13> 2019-08-07 00:53:13.121 7fe99966c700  5 bluestore.MempoolThread(0x55ff04ad6a88) _tune_cache_size target: 4294967296 heap: 6018998272 unmapped: 1725702144 mapped: 4293296128 old cache_size: 2018599645 new cache size: 2018921358

   -12> 2019-08-07 00:53:18.081 7fe99966c700  5 bluestore.MempoolThread(0x55ff04ad6a88) _trim_shards cache_size: 2020206960 kv_alloc: 771751936 kv_used: 749381098 meta_alloc: 771751936 meta_used: 654590799 data_alloc: 452984832 data_used: 451538944

   -11> 2019-08-07 00:53:18.130 7fe99966c700  5 bluestore.MempoolThread(0x55ff04ad6a88) _tune_cache_size target: 4294967296 heap: 6018998272 unmapped: 1725702144 mapped: 4293296128 old cache_size: 2020206960 new cache size: 2020528048

   -10> 2019-08-07 00:53:21.757 7fe995bfa700 10 monclient: tick

    -9> 2019-08-07 00:53:21.757 7fe995bfa700 10 monclient: _check_auth_rotating have uptodate secrets (they expire after 2019-08-07 00:52:51.759214)

    -8> 2019-08-07 00:53:23.090 7fe99966c700  5 bluestore.MempoolThread(0x55ff04ad6a88) _trim_shards cache_size: 2021811150 kv_alloc: 780140544 kv_used: 749381098 meta_alloc: 780140544 meta_used: 654590799 data_alloc: 461373440 data_used: 451538944

    -7> 2019-08-07 00:53:23.140 7fe99966c700  5 bluestore.MempoolThread(0x55ff04ad6a88) _tune_cache_size target: 4294967296 heap: 6018998272 unmapped: 1725702144 mapped: 4293296128 old cache_size: 2021811150 new cache size: 2022131613

    -6> 2019-08-07 00:53:28.100 7fe99966c700  5 bluestore.MempoolThread(0x55ff04ad6a88) _trim_shards cache_size: 2023412220 kv_alloc: 780140544 kv_used: 749381098 meta_alloc: 780140544 meta_used: 654590799 data_alloc: 461373440 data_used: 451538944

    -5> 2019-08-07 00:53:28.150 7fe99966c700  5 bluestore.MempoolThread(0x55ff04ad6a88) _tune_cache_size target: 4294967296 heap: 6018998272 unmapped: 1725702144 mapped: 4293296128 old cache_size: 2023412220 new cache size: 2023732060

    -4> 2019-08-07 00:53:31.758 7fe995bfa700 10 monclient: tick

    -3> 2019-08-07 00:53:31.758 7fe995bfa700 10 monclient: _check_auth_rotating have uptodate secrets (they expire after 2019-08-07 00:53:01.759334)

    -2> 2019-08-07 00:53:33.110 7fe99966c700  5 bluestore.MempoolThread(0x55ff04ad6a88) _trim_shards cache_size: 2025010178 kv_alloc: 780140544 kv_used: 749381098 meta_alloc: 780140544 meta_used: 654590799 data_alloc: 461373440 data_used: 451538944

    -1> 2019-08-07 00:53:33.160 7fe99966c700  5 bluestore.MempoolThread(0x55ff04ad6a88) _tune_cache_size target: 4294967296 heap: 6018998272 unmapped: 1725702144 mapped: 4293296128 old cache_size: 2025010178 new cache size: 2025329397

     0> 2019-08-07 00:53:37.655 7fe987e49700 -1 *** Caught signal (Aborted) **

in thread 7fe987e49700 thread_name:tp_osd_tp

 

ceph version 14.2.2 (4f8fa0a0024755aae7d95567c63f11d6862d55be) nautilus (stable)

1: (()+0xf5d0) [0x7fe9a7cba5d0]

2: (pthread_kill()+0x31) [0x7fe9a7cb79d1]

3: (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d const*, char const*, unsigned long)+0x466) [0x55fef8748176]

4: (ceph::HeartbeatMap::clear_timeout(ceph::heartbeat_handle_d*)+0x7b) [0x55fef874878b]

5: (BlueStore::queue_transactions(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Transaction> >&, boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0xa9e) [0x55fef86085de]

6: (ObjectStore::queue_transaction(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ObjectStore::Transaction&&, boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x7f) [0x55fef81cd7ff]

7: (OSD::dispatch_context_transaction(PG::RecoveryCtx&, PG*, ThreadPool::TPHandle*)+0x58) [0x55fef8118298]

8: (OSD::dequeue_peering_evt(OSDShard*, PG*, std::shared_ptr<PGPeeringEvent>, ThreadPool::TPHandle&)+0x202) [0x55fef81767c2]

9: (PGPeeringItem::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x50) [0x55fef83eb490]

10: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x9f4) [0x55fef816aef4]

11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x433) [0x55fef8769ce3]

12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55fef876cd80]

13: (()+0x7dd5) [0x7fe9a7cb2dd5]

14: (clone()+0x6d) [0x7fe9a6b7202d]

 

About server load: 

 

[root@CEPH008 ~]# top

top - 00:57:30 up 186 days, 22 min,  1 user,  load average: 11.65, 13.42, 13.51

Tasks: 316 total,   1 running, 315 sleeping,   0 stopped,   0 zombie

%Cpu(s):  2.3 us,  1.2 sy,  0.0 ni, 74.1 id, 22.4 wa,  0.0 hi,  0.1 si,  0.0 st

KiB Mem : 65737480 total,   431824 free, 49046608 used, 16259048 buff/cache

KiB Swap: 29241340 total, 19406504 free,  9834836 used. 15917556 avail Mem

 

 

Currently the server is doing some deep-scrub that we got off during the last two weeks due a node evict and a new node install.

 

 

 

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux