all OSDs crash at more or less the same time

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

While running cephtool-test-rados.sh "all of a sudden" the OSDs
disapear, I had one of the logs open which contained at the end:

    -2> 2016-03-06 21:56:02.073226 80569ed00  1 heartbeat_map is_healthy
'OSD::osd_op_tp thread 0x806795200' had timed out after 15
    -1> 2016-03-06 21:56:02.073248 80569ed00  1 heartbeat_map is_healthy
'OSD::osd_op_tp thread 0x806795200' had suicide timed out after 150
     0> 2016-03-06 21:56:02.113948 80569ed00 -1 common/HeartbeatMap.cc:
In function 'bool ceph::HeartbeatMap::_check(const ceph::heartbeat_handle_d
*, const char *, time_t)' thread 80569ed00 time 2016-03-06 21:56:02.073269
common/HeartbeatMap.cc: 86: FAILED assert(0 == "hit suicide timeout")

the monitor is still running. It claims the heartbeat_map is valid, but
still it suicides??

And what messages would prevent this from happening?
Receiving heartbeats from other OSDs?

IF so how would a 2 OSD server even survive when its connection would be
split for longer than 2,5 minute?

--WjW
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux