osds processes shutdown during outage

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




Hi,

(sorry if this gets posted twice. I forgot a subject in the first mail)

We expereinced an outage this morning on a jewel cluster with 1559 osds.
It appeared that a switch uplink in a rack misbehaved and once shutting that
interface ceph health restored quickly. I have some questions though on
osd behaviour that I hope someone can answer

1 - In a lot of osd logs I saw that neighbours reported the osd down
(while the process was still running and obviously logging). Then after a
while the logs shows

  * Got signal Interrupt
  * prepare_to_stop starting shutdown

and the osd process stops

Why does the osd proces stop? Is it instructed to do so by the monitor
because neighbours reported it down and ceph wants to avoid flapping?

2 - The osds reported a lot of

  * heartbeat_check: no reply from #ip:#port

When I telnet to the ip and port I get a connection just fine. Is there a
way to run a heartbeat_check from the commandline so that we can try
capture the traffic to determine why it fails

Thanks

Marcel
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux