[Ceph Octopus 15.2.3 ] MDS crashed suddenly

carlimeunier@xxxxxxxxx · Mon, 20 Jul 2020 12:37:32 -0000

Hi,

I made a fresh install of Ceph Octopus 15.2.3 recently.
And after a few days, the 2 standby MDS suddenly crashed with segmentation fault error.
I try to restart it but it does not start.

Here is the error :

 -20> 2020-07-17T13:50:27.888+0000 7fc8c6c51700 10 monclient: _renew_subs
 -19> 2020-07-17T13:50:27.888+0000 7fc8c6c51700 10 monclient: _send_mon_message to mon.2 at v1:172.31.36.98:6789/0
 -18> 2020-07-17T13:50:27.888+0000 7fc8c6c51700 10 monclient: handle_get_version_reply finishing 0x559dcf9530c0 version 269
 -17> 2020-07-17T13:50:27.888+0000 7fc8c6c51700 10 monclient: handle_get_version_reply finishing 0x559dcfa87520 version 269
 -16> 2020-07-17T13:50:27.888+0000 7fc8c6c51700 10 monclient: handle_get_version_reply finishing 0x559dcfa875c0 version 269
 -15> 2020-07-17T13:50:27.888+0000 7fc8c6c51700 10 monclient: handle_get_version_reply finishing 0x559dcfa871c0 version 269
 -14> 2020-07-17T13:50:27.888+0000 7fc8c8c55700 10 monclient: get_auth_request con 0x559dcfada000 auth_method 0
 -13> 2020-07-17T13:50:27.888+0000 7fc8c9456700 10 monclient: get_auth_request con 0x559dcfada800 auth_method 0
 -12> 2020-07-17T13:50:27.892+0000 7fc8bfc43700 1 mds.282966.journaler.mdlog(ro) recover start
 -11> 2020-07-17T13:50:27.892+0000 7fc8bfc43700 1 mds.282966.journaler.mdlog(ro) read_head
 -10> 2020-07-17T13:50:27.892+0000 7fc8bfc43700 4 mds.0.log Waiting for journal 0x200 to recover...
 -9> 2020-07-17T13:50:27.893+0000 7fc8c0444700 1 mds.282966.journaler.mdlog(ro) _finish_read_head loghead(trim 4194304, expire 4231216, write 4329405, stream_format 1). probing for end of log (from 4329405)...
 -8> 2020-07-17T13:50:27.893+0000 7fc8c0444700 1 mds.282966.journaler.mdlog(ro) probing for end of the log
 -7> 2020-07-17T13:50:27.893+0000 7fc8c0444700 1 mds.282966.journaler.mdlog(ro) _finish_probe_end write_pos = 4329949 (header had 4329405). recovered.
 -6> 2020-07-17T13:50:27.893+0000 7fc8bfc43700 4 mds.0.log Journal 0x200 recovered.
 -5> 2020-07-17T13:50:27.893+0000 7fc8bfc43700 4 mds.0.log Recovered journal 0x200 in format 1
 -4> 2020-07-17T13:50:27.893+0000 7fc8bfc43700 2 mds.0.0 Booting: 1: loading/discovering base inodes
 -3> 2020-07-17T13:50:27.893+0000 7fc8bfc43700 0 mds.0.cache creating system inode with ino:0x100
 -2> 2020-07-17T13:50:27.894+0000 7fc8bfc43700 0 mds.0.cache creating system inode with ino:0x1
 -1> 2020-07-17T13:50:27.894+0000 7fc8c0444700 2 mds.0.0 Booting: 2: replaying mds log
 0> 2020-07-17T13:50:27.896+0000 7fc8bec41700 -1 *** Caught signal (Segmentation fault) **
 in thread 7fc8bec41700 thread_name:md_log_replay 

Here is the cluster information :
# ceph status
 cluster:
 id: dd024fe1-4996-4fed-ba57-03090e53724d
 health: HEALTH_WARN
 20 daemons have recently crashed

 services:
 mon: 3 daemons, quorum 2,0,1 (age 2d)
 mgr: mgr.0(active, since 9d), standbys: mgr.2, mgr.1
 mds: cephfs:1 {0=node0=up:active} 1 up:standby-replay 1 up:standby
 osd: 3 osds: 3 up (since 28h), 3 in (since 9d)

 task status:
 scrub status:
 mds.node0: idle
 mds.node2: idle

 data:
 pools: 3 pools, 49 pgs
 objects: 29 objects, 170 KiB
 usage: 3.0 GiB used, 41 TiB / 41 TiB avail
 pgs: 49 active+clean

 io:
 client: 853 B/s rd, 1 op/s rd, 0 op/s wr 

There is only 1 client  connected to the cluster.

Please, does anyone have any idea?
Thanks
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx