Hi, (sorry if this gets posted twice. I forgot a subject in the first mail) We expereinced an outage this morning on a jewel cluster with 1559 osds. It appeared that a switch uplink in a rack misbehaved and once shutting that interface ceph health restored quickly. I have some questions though on osd behaviour that I hope someone can answer 1 - In a lot of osd logs I saw that neighbours reported the osd down (while the process was still running and obviously logging). Then after a while the logs shows * Got signal Interrupt * prepare_to_stop starting shutdown and the osd process stops Why does the osd proces stop? Is it instructed to do so by the monitor because neighbours reported it down and ceph wants to avoid flapping? 2 - The osds reported a lot of * heartbeat_check: no reply from #ip:#port When I telnet to the ip and port I get a connection just fine. Is there a way to run a heartbeat_check from the commandline so that we can try capture the traffic to determine why it fails Thanks Marcel _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx