Ceph mon crash, many osd down

hoannv46@xxxxxxxxx · Thu, 20 Aug 2020 11:11:48 -0000

Hi all.

My cluster has many log mon scrub mon data

2020-08-20 13:12:16.393 7fe89becc700  0 log_channel(cluster) log [DBG] : scrub ok on 0,1,2,3: ScrubResult(keys {auth=100} crc {auth=3066031631})
2020-08-20 13:12:16.395 7fe89becc700  0 log_channel(cluster) log [DBG] : scrub ok on 0,1,2,3: ScrubResult(keys {auth=100} crc {auth=221313478})
2020-08-20 13:12:16.401 7fe89becc700  0 log_channel(cluster) log [DBG] : scrub ok on 0,1,2,3: ScrubResult(keys {auth=15,config=2,health=10,logm=73} crc {auth=2119885989,config=3307175017,health=67914304,logm=3854202346})
2020-08-20 13:12:16.404 7fe89becc700  0 log_channel(cluster) log [DBG] : scrub ok on 0,1,2,3: ScrubResult(keys {logm=100} crc {logm=3116621380})
2020-08-20 13:12:16.408 7fe89becc700  0 log_channel(cluster) log [DBG] : scrub ok on 0,1,2,3: ScrubResult(keys {logm=100} crc {logm=767596958})
2020-08-20 13:12:16.411 7fe89becc700  0 log_channel(cluster) log [DBG] : scrub ok on 0,1,2,3: ScrubResult(keys {logm=100} crc {logm=3982727178})
2020-08-20 13:12:16.414 7fe89becc700  0 log_channel(cluster) log [DBG] : scrub ok on 0,1,2,3: ScrubResult(keys {logm=100} crc {logm=4144183080})

after 900 seconds, mon mark osd down

2020-08-20 13:23:32.546 7fe89e6d1700  0 log_channel(cluster) log [INF] : osd.112 marked down after no beacon for 904.106586 seconds
2020-08-20 13:23:32.546 7fe89e6d1700 -1 mon.ceph-mon-1@0(leader).osd e2960665 no beacon from osd.112 since 2020-08-20 13:08:28.441052, 904.106586 seconds ago.  marking down
2020-08-20 13:23:32.551 7fe89e6d1700  0 log_channel(cluster) log [WRN] : Health check failed: 1 osds down (OSD_DOWN)
2020-08-20 13:24:07.899 7fe89e6d1700  0 log_channel(cluster) log [INF] : osd.263 marked down after no beacon for 901.445447 seconds
2020-08-20 13:24:07.899 7fe89e6d1700 -1 mon.ceph-mon-1@0(leader).osd e2960666 no beacon from osd.263 since 2020-08-20 13:09:06.454891, 901.445447 seconds ago.  marking down
2020-08-20 13:24:07.902 7fe89e6d1700  0 log_channel(cluster) log [WRN] : Health check update: 2 osds down (OSD_DOWN)
2020-08-20 13:24:13.020 7fe89e6d1700  0 log_channel(cluster) log [INF] : osd.384 marked down after no beacon for 900.132560 seconds
2020-08-20 13:24:13.020 7fe89e6d1700 -1 mon.ceph-mon-1@0(leader).osd e2960667 no beacon from osd.384 since 2020-08-20 13:09:12.888844, 900.132560 seconds ago.  marking down
2020-08-20 13:24:13.020 7fe89e6d1700  0 log_channel(cluster) log [INF] : osd.614 marked down after no beacon for 901.359447 seconds
2020-08-20 13:24:13.020 7fe89e6d1700 -1 mon.ceph-mon-1@0(leader).osd e2960667 no beacon from osd.614 since 2020-08-20 13:09:11.661958, 901.359447 seconds ago.  marking down
2020-08-20 13:24:13.026 7fe89e6d1700  0 log_channel(cluster) log [WRN] : Health check update: 4 osds down (OSD_DOWN)
2020-08-20 13:24:18.084 7fe89e6d1700  0 log_channel(cluster) log [INF] : osd.34 marked down after no beacon for 903.818250 seconds

is this bug of ceph mon.

Thanks.
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx