Re: Noob question - ceph-mgr crash on arm

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi

Tcmalloc on arm7 is problematic. You need to compile your own with either jemalloc or just libc malloc

/Torben

Den 20. maj 2019 17.48.40 CEST, "Jesper Taxbøl" <jesper@xxxxxxxxxx> skrev:
I am trying to setup a Ceph cluster on 4 odroid-hc2 instances on top of Ubuntu 18.04. 

My ceph-mgr deamon keeps crashing on me. 

Any advise on how to proceed?

Log on mgr node says something about ms_dispatch:

2019-05-20 15:34:43.070424 b6714230  0 set uid:gid to 64045:64045 (ceph:ceph)
2019-05-20 15:34:43.070455 b6714230  0 ceph version 12.2.11 (26dc3775efc7bb286a1d6d66faee0b
a30ea23eee) luminous (stable), process ceph-mgr, pid 1169
2019-05-20 15:34:43.070799 b6714230  0 pidfile_write: ignore empty --pid-file
2019-05-20 15:34:43.101162 b6714230  1 mgr send_beacon standby
2019-05-20 15:34:43.124462 b06f8c30 -1 *** Caught signal (Segmentation fault) **
in thread b06f8c30 thread_name:ms_dispatch

ceph version 12.2.11 (26dc3775efc7bb286a1d6d66faee0ba30ea23eee) luminous (stable)
1: (()+0x30133c) [0x77033c]
2: (()+0x25750) [0xb688a750]
3: (_ULarm_step()+0x55) [0xb6816ce6]
4: (()+0x255e8) [0xb6cd85e8]
5: (GetStackTrace(void**, int, int)+0x25) [0xb6cd8a3e]
6: (tcmalloc::PageHeap::GrowHeap(unsigned int)+0xb9) [0xb6ccd36a]
7: (tcmalloc::PageHeap::New(unsigned int)+0x79) [0xb6ccd5e6]
8: (tcmalloc::CentralFreeList::Populate()+0x71) [0xb6ccc5ce]
9: (tcmalloc::CentralFreeList::FetchFromOneSpansSafe(int, void**, void**)+0x1b) [0xb6ccc76
0]
10: (tcmalloc::CentralFreeList::RemoveRange(void**, void**, int)+0x6d) [0xb6ccc7de]
11: (tcmalloc::ThreadCache::FetchFromCentralCache(unsigned int, unsigned int)+0x51) [0xb6c
cea56]
12: (malloc()+0x22d) [0xb6cd9a8e]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this
.

--- begin dump of recent events ---
  -90> 2019-05-20 15:34:43.053293 b6714230  5 asok(0x55b5320) register_command perfcounter
s_dump hook 0x554c088
  -89> 2019-05-20 15:34:43.053322 b6714230  5 asok(0x55b5320) register_command 1 hook 0x55
4c088
  -88> 2019-05-20 15:34:43.053330 b6714230  5 asok(0x55b5320) register_command perf dump h
ook 0x554c088
  -87> 2019-05-20 15:34:43.053341 b6714230  5 asok(0x55b5320) register_command perfcounter
s_schema hook 0x554c088
  -86> 2019-05-20 15:34:43.053360 b6714230  5 asok(0x55b5320) register_command perf histog
ram dump hook 0x554c088
  -85> 2019-05-20 15:34:43.053374 b6714230  5 asok(0x55b5320) register_command 2 hook 0x55
4c088
  -84> 2019-05-20 15:34:43.053381 b6714230  5 asok(0x55b5320) register_command perf schema
hook 0x554c088
  -83> 2019-05-20 15:34:43.053389 b6714230  5 asok(0x55b5320) register_command perf histog
ram schema hook 0x554c088
  -82> 2019-05-20 15:34:43.053410 b6714230  5 asok(0x55b5320) register_command perf reset
hook 0x554c088
  -81> 2019-05-20 15:34:43.053418 b6714230  5 asok(0x55b5320) register_command config show
hook 0x554c088
  -80> 2019-05-20 15:34:43.053425 b6714230  5 asok(0x55b5320) register_command config help
hook 0x554c088
  -79> 2019-05-20 15:34:43.053436 b6714230  5 asok(0x55b5320) register_command config set
hook 0x554c088
  -78> 2019-05-20 15:34:43.053444 b6714230  5 asok(0x55b5320) register_command config get
hook 0x554c088
  -77> 2019-05-20 15:34:43.053459 b6714230  5 asok(0x55b5320) register_command config diff
hook 0x554c088
  -76> 2019-05-20 15:34:43.053467 b6714230  5 asok(0x55b5320) register_command config diff
get hook 0x554c088
  -75> 2019-05-20 15:34:43.053475 b6714230  5 asok(0x55b5320) register_command log flush h
ook 0x554c088
  -74> 2019-05-20 15:34:43.053482 b6714230  5 asok(0x55b5320) register_command log dump ho
ok 0x554c088
  -73> 2019-05-20 15:34:43.053490 b6714230  5 asok(0x55b5320) register_command log reopen
hook 0x554c088
  -72> 2019-05-20 15:34:43.053513 b6714230  5 asok(0x55b5320) register_command dump_mempoo
ls hook 0x56e3504
  -71> 2019-05-20 15:34:43.070424 b6714230  0 set uid:gid to 64045:64045 (ceph:ceph)
  -70> 2019-05-20 15:34:43.070455 b6714230  0 ceph version 12.2.11 (26dc3775efc7bb286a1d6d
66faee0ba30ea23eee) luminous (stable), process ceph-mgr, pid 1169
  -69> 2019-05-20 15:34:43.070799 b6714230  0 pidfile_write: ignore empty --pid-file
  -68> 2019-05-20 15:34:43.074441 b6714230  5 asok(0x55b5320) init /var/run/ceph/ceph-mgr.
odroid-c.asok
  -67> 2019-05-20 15:34:43.074473 b6714230  5 asok(0x55b5320) bind_and_listen /var/run/cep
h/ceph-mgr.odroid-c.asok
  -66> 2019-05-20 15:34:43.074615 b6714230  5 asok(0x55b5320) register_command 0 hook 0x55
4c1d0
  -65> 2019-05-20 15:34:43.074633 b6714230  5 asok(0x55b5320) register_command version hoo
k 0x554c1d0
  -64> 2019-05-20 15:34:43.074654 b6714230  5 asok(0x55b5320) register_command git_version
hook 0x554c1d0
  -63> 2019-05-20 15:34:43.074674 b6714230  5 asok(0x55b5320) register_command help hook 0
x554c1d8
  -62> 2019-05-20 15:34:43.074694 b6714230  5 asok(0x55b5320) register_command get_command
_descriptions hook 0x554c1e0
  -61> 2019-05-20 15:34:43.074785 b3effc30  5 asok(0x55b5320) entry start
  -60> 2019-05-20 15:34:43.076464 b36fec30  2 Event(0x554e068 nevent=5000 time_id=1).set_o
wner idx=0 owner=3010456624
  -59> 2019-05-20 15:34:43.076559 b2efdc30  2 Event(0x554e488 nevent=5000 time_id=1).set_o
wner idx=1 owner=3002063920
  -58> 2019-05-20 15:34:43.076643 b26fcc30  2 Event(0x554e1c8 nevent=5000 time_id=1).set_o
wner idx=2 owner=2993671216
  -57> 2019-05-20 15:34:43.077177 b6714230  1  Processor -- start
  -56> 2019-05-20 15:34:43.077298 b6714230  1 -- - start start
  -55> 2019-05-20 15:34:43.077315 b6714230 10 monclient: build_initial_monmap
  -54> 2019-05-20 15:34:43.077362 b6714230 10 monclient: init
  -53> 2019-05-20 15:34:43.077380 b6714230  5 adding auth protocol: cephx
  -52> 2019-05-20 15:34:43.077391 b6714230 10 monclient: auth_supported 2 method cephx
  -51> 2019-05-20 15:34:43.077625 b6714230  2 auth: KeyRing::load: loaded key file /var/li
b/ceph/mgr/ceph-odroid-c/keyring
  -50> 2019-05-20 15:34:43.077761 b6714230 10 monclient: _reopen_session rank -1
  -49> 2019-05-20 15:34:43.077847 b6714230 10 monclient(hunting): picked mon.noname-a con
0x5792d00 addr 192.168.130.131:6789/0
  -48> 2019-05-20 15:34:43.077899 b6714230  1 -- - --> 192.168.130.131:6789/0 -- auth(prot
o 0 33 bytes epoch 0) v1 -- 0x5590680 con 0
  -47> 2019-05-20 15:34:43.077985 b6714230 10 monclient(hunting): _renew_subs
  -46> 2019-05-20 15:34:43.080980 b2efdc30  1 -- 192.168.130.132:0/2049423493 learned_addr
learned my addr 192.168.130.132:0/2049423493
  -45> 2019-05-20 15:34:43.082020 b2efdc30  2 -- 192.168.130.132:0/2049423493 >> 192.168.1
30.131:6789/0 conn(0x5792d00 :-1 s=STATE_CONNECTING_WAIT_ACK_SEQ pgs=0 cs=0 l=0)._process_c
onnection got newly_acked_seq 0 vs out_seq 0
  -44> 2019-05-20 15:34:43.084528 b2efdc30  5 -- 192.168.130.132:0/2049423493 >> 192.168.1
30.131:6789/0 conn(0x5792d00 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=45 cs=1
l=1). rx mon.0 seq 1 0x55aa900 mon_map magic: 0 v1
  -43> 2019-05-20 15:34:43.084615 b06f8c30  1 -- 192.168.130.132:0/2049423493 <== mon.0 19
2.168.130.131:6789/0 1 ==== mon_map magic: 0 v1 ==== 196+0+0 (1694575244 0 0) 0x55aa900 con
0x5792d00
  -42> 2019-05-20 15:34:43.084656 b06f8c30 10 monclient(hunting): handle_monmap mon_map ma
gic: 0 v1
  -41> 2019-05-20 15:34:43.084685 b06f8c30 10 monclient(hunting):  got monmap 1, mon.nonam
e-a is now rank -1
  -40> 2019-05-20 15:34:43.084698 b06f8c30 10 monclient(hunting): dump:
epoch 1
fsid 75cb9a2d-673b-4a32-897a-05470a08ed58
last_changed 2019-05-20 15:02:53.998735
created 2019-05-20 15:02:53.998735
0: 192.168.130.131:6789/0 mon.odroid-b

  -39> 2019-05-20 15:34:43.084956 b2efdc30  5 -- 192.168.130.132:0/2049423493 >> 192.168.1
30.131:6789/0 conn(0x5792d00 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=45 cs=1
l=1). rx mon.0 seq 2 0x55a0540 auth_reply(proto 2 0 (0) Success) v1
  -38> 2019-05-20 15:34:43.085011 b06f8c30  1 -- 192.168.130.132:0/2049423493 <== mon.0 19
2.168.130.131:6789/0 2 ==== auth_reply(proto 2 0 (0) Success) v1 ==== 33+0+0 (4086221156 0
0) 0x55a0540 con 0x5792d00
  -37> 2019-05-20 15:34:43.085053 b06f8c30 10 monclient(hunting): my global_id is 24139
  -36> 2019-05-20 15:34:43.085175 b06f8c30  1 -- 192.168.130.132:0/2049423493 --> 192.168.
130.131:6789/0 -- auth(proto 2 32 bytes epoch 0) v1 -- 0x5590d00 con 0
  -35> 2019-05-20 15:34:43.088488 b2efdc30  5 -- 192.168.130.132:0/2049423493 >> 192.168.1
30.131:6789/0 conn(0x5792d00 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=45 cs=1
l=1). rx mon.0 seq 3 0x55a0700 auth_reply(proto 2 0 (0) Success) v1
  -34> 2019-05-20 15:34:43.088712 b06f8c30  1 -- 192.168.130.132:0/2049423493 <== mon.0 19
2.168.130.131:6789/0 3 ==== auth_reply(proto 2 0 (0) Success) v1 ==== 222+0+0 (1945430716 0
0) 0x55a0700 con 0x5792d00
  -33> 2019-05-20 15:34:43.089295 b06f8c30  1 -- 192.168.130.132:0/2049423493 --> 192.168.
130.131:6789/0 -- auth(proto 2 181 bytes epoch 0) v1 -- 0x5590680 con 0
  -32> 2019-05-20 15:34:43.097488 b2efdc30  5 -- 192.168.130.132:0/2049423493 >> 192.168.1
30.131:6789/0 conn(0x5792d00 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=45 cs=1
l=1). rx mon.0 seq 4 0x55a08c0 auth_reply(proto 2 0 (0) Success) v1
  -31> 2019-05-20 15:34:43.097643 b06f8c30  1 -- 192.168.130.132:0/2049423493 <== mon.0 19
2.168.130.131:6789/0 4 ==== auth_reply(proto 2 0 (0) Success) v1 ==== 783+0+0 (327382700 0
0) 0x55a08c0 con 0x5792d00
  -30> 2019-05-20 15:34:43.098725 b06f8c30  1 monclient: found mon.odroid-b
  -29> 2019-05-20 15:34:43.098850 b06f8c30 10 monclient: _send_mon_message to mon.odroid-b
at 192.168.130.131:6789/0
  -28> 2019-05-20 15:34:43.098898 b06f8c30  1 -- 192.168.130.132:0/2049423493 --> 192.168.
130.131:6789/0 -- mon_subscribe({mgrmap=0+,monmap=0+}) v2 -- 0x554eb00 con 0
  -27> 2019-05-20 15:34:43.099042 b06f8c30 10 monclient: _check_auth_rotating renewing rot
ating keys (they expired before 2019-05-20 15:34:13.099036)
  -26> 2019-05-20 15:34:43.099183 b06f8c30 10 monclient: _send_mon_message to mon.odroid-b
at 192.168.130.131:6789/0
  -25> 2019-05-20 15:34:43.099271 b06f8c30  1 -- 192.168.130.132:0/2049423493 --> 192.168.
130.131:6789/0 -- auth(proto 2 2 bytes epoch 0) v1 -- 0x5590d00 con 0
  -24> 2019-05-20 15:34:43.099404 b6714230  5 monclient: authenticate success, global_id 2
4139
  -23> 2019-05-20 15:34:43.099543 b6714230 10 log_channel(cluster) update_config to_monito
rs: true to_syslog: false syslog_facility: daemon prio: info to_graylog: false graylog_host
: 127.0.0.1 graylog_port: 12201)
  -22> 2019-05-20 15:34:43.099602 b6714230 10 log_channel(audit) update_config to_monitors
: true to_syslog: false syslog_facility: local0 prio: info to_graylog: false graylog_host:
127.0.0.1 graylog_port: 12201)
  -21> 2019-05-20 15:34:43.099970 b6714230  5 asok(0x55b5320) register_command objecter_re
quests hook 0x554c238
  -20> 2019-05-20 15:34:43.100171 b6714230 10 monclient: _renew_subs
  -19> 2019-05-20 15:34:43.100214 b6714230 10 monclient: _send_mon_message to mon.odroid-b
at 192.168.130.131:6789/0
  -18> 2019-05-20 15:34:43.100246 b6714230  1 -- 192.168.130.132:0/2049423493 --> 192.168.
130.131:6789/0 -- mon_subscribe({osdmap=0}) v2 -- 0x554ec60 con 0
  -17> 2019-05-20 15:34:43.100737 b6714230  5 asok(0x55b5320) register_command mds_request
s hook 0xbefefe80
  -16> 2019-05-20 15:34:43.100793 b6714230  5 asok(0x55b5320) register_command mds_session
s hook 0xbefefe80
  -15> 2019-05-20 15:34:43.100847 b6714230  5 asok(0x55b5320) register_command dump_cache
hook 0xbefefe80
  -14> 2019-05-20 15:34:43.100811 b2efdc30  5 -- 192.168.130.132:0/2049423493 >> 192.168.1
30.131:6789/0 conn(0x5792d00 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=45 cs=1
l=1). rx mon.0 seq 5 0x558dc00 mgrmap(e 99) v1
  -13> 2019-05-20 15:34:43.100915 b6714230  5 asok(0x55b5320) register_command kick_stale_
sessions hook 0xbefefe80
  -12> 2019-05-20 15:34:43.100977 b6714230  5 asok(0x55b5320) register_command status hook
0xbefefe80
  -11> 2019-05-20 15:34:43.100987 b06f8c30  1 -- 192.168.130.132:0/2049423493 <== mon.0 19
2.168.130.131:6789/0 5 ==== mgrmap(e 99) v1 ==== 232+0+0 (4078310027 0 0) 0x558dc00 con 0x5
792d00
  -10> 2019-05-20 15:34:43.101004 b2efdc30  5 -- 192.168.130.132:0/2049423493 >> 192.168.1
30.131:6789/0 conn(0x5792d00 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=45 cs=1
l=1). rx mon.0 seq 6 0x55aaa80 mon_map magic: 0 v1
   -9> 2019-05-20 15:34:43.101162 b6714230  1 mgr send_beacon standby
   -8> 2019-05-20 15:34:43.101575 b2efdc30  5 -- 192.168.130.132:0/2049423493 >> 192.168.1
30.131:6789/0 conn(0x5792d00 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=45 cs=1
l=1). rx mon.0 seq 7 0x55a0540 auth_reply(proto 2 0 (0) Success) v1
   -7> 2019-05-20 15:34:43.101889 b2efdc30  5 -- 192.168.130.132:0/2049423493 >> 192.168.1
30.131:6789/0 conn(0x5792d00 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=45 cs=1
l=1). rx mon.0 seq 8 0x5590d00 osd_map(42..42 src has 1..42) v3
   -6> 2019-05-20 15:34:43.102775 b6714230 10 monclient: _send_mon_message to mon.odroid-b
at 192.168.130.131:6789/0
   -5> 2019-05-20 15:34:43.102838 b6714230  1 -- 192.168.130.132:0/2049423493 --> 192.168.
130.131:6789/0 -- mgrbeacon mgr.odroid-c(75cb9a2d-673b-4a32-897a-05470a08ed58,24139, -, 0)
v6 -- 0x5562400 con 0
   -4> 2019-05-20 15:34:43.102991 b6714230  4 mgr init Complete.
   -3> 2019-05-20 15:34:43.103065 b06f8c30  4 mgr ms_dispatch standby mgrmap(e 99) v1
   -2> 2019-05-20 15:34:43.103110 b06f8c30  4 mgr handle_mgr_map received map epoch 99
   -1> 2019-05-20 15:34:43.103128 b06f8c30  4 mgr handle_mgr_map active in map: 0 active i
s 24134
    0> 2019-05-20 15:34:43.124462 b06f8c30 -1 *** Caught signal (Segmentation fault) **
in thread b06f8c30 thread_name:ms_dispatch

ceph version 12.2.11 (26dc3775efc7bb286a1d6d66faee0ba30ea23eee) luminous (stable)
1: (()+0x30133c) [0x77033c]
2: (()+0x25750) [0xb688a750]
3: (_ULarm_step()+0x55) [0xb6816ce6]
4: (()+0x255e8) [0xb6cd85e8]
5: (GetStackTrace(void**, int, int)+0x25) [0xb6cd8a3e]
6: (tcmalloc::PageHeap::GrowHeap(unsigned int)+0xb9) [0xb6ccd36a]
7: (tcmalloc::PageHeap::New(unsigned int)+0x79) [0xb6ccd5e6]
8: (tcmalloc::CentralFreeList::Populate()+0x71) [0xb6ccc5ce]
9: (tcmalloc::CentralFreeList::FetchFromOneSpansSafe(int, void**, void**)+0x1b) [0xb6ccc76
0]
10: (tcmalloc::CentralFreeList::RemoveRange(void**, void**, int)+0x6d) [0xb6ccc7de]
11: (tcmalloc::ThreadCache::FetchFromCentralCache(unsigned int, unsigned int)+0x51) [0xb6c
cea56]
12: (malloc()+0x22d) [0xb6cd9a8e]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this
.

--- logging levels ---
  0/ 5 none
  0/ 1 lockdep
  0/ 1 context
  1/ 1 crush
  1/ 5 mds
  1/ 5 mds_balancer
  1/ 5 mds_locker
  1/ 5 mds_log
  1/ 5 mds_log_expire
  1/ 5 mds_migrator
  0/ 1 buffer
  0/ 1 timer
  0/ 1 filer
  0/ 1 striper
  0/ 1 objecter
  0/ 5 rados
  0/ 5 rbd
  0/ 5 rbd_mirror
  0/ 5 rbd_replay
  0/ 5 journaler
  0/ 5 objectcacher
  0/ 5 client
  1/ 5 osd
  0/ 5 optracker
  0/ 5 objclass
  1/ 3 filestore
  1/ 3 journal
  0/ 5 ms
  1/ 5 mon
  0/10 monc
  1/ 5 paxos
  0/ 5 tp
  1/ 5 auth
  1/ 5 crypto
  1/ 1 finisher
  1/ 1 reserver
  1/ 5 heartbeatmap
  1/ 5 perfcounter
  1/ 5 rgw
  1/10 civetweb
  1/ 5 javaclient
  1/ 5 asok
  1/ 1 throttle
  0/ 0 refs
  1/ 5 xio
  1/ 5 compressor
  1/ 5 bluestore
  1/ 5 bluefs
  1/ 3 bdev
  1/ 5 kstore
  4/ 5 rocksdb
  4/ 5 leveldb
  4/ 5 memdb
  1/ 5 kinetic
  1/ 5 fuse
  1/ 5 mgr
  1/ 5 mgrc
  1/ 5 dpdk
  1/ 5 eventtrace
 -2/-2 (syslog threshold)
 -1/-1 (stderr threshold)
 max_recent     10000
 max_new         1000
 log_file /var/log/ceph/ceph-mgr.odroid-c.log
--- end dump of recent events ---



Kind regards

Jesper 

--
Dette er sendt fra min mobiltelefon. Undskyld at jeg fatter mig i korthed.
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux