OSDs are down, don't know why

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello,

I'm setting up a small test instance of ceph and I'm running into a situation where the OSDs are being shown as down, but I don't know why.

Connectivity seems to be working. The OSD hosts are able to communicate with the MON hosts; running "ceph status" and "ceph osd in" from an OSD host works fine, but with a HEALTH_WARN that I have 2 osds: 0 up, 2 in. Both the OSD and MON daemons seem to be running fine. Network connectivity seems to be okay: I can nc from the OSD to port 6789 on the MON, and from the MON to port 6800-6803 on the OSD (I have constrained the ms bind port min/max config options so that the OSDs will use only these ports). Neither OSD nor MON logs show anything that seems unusual, nor why the OSD is marked as being down.

Furthermore, using tcpdump i've watched network traffic between the OSD and the MON, and it seems that the OSD is sending heartbeats and getting an ack from the MON. So I'm definitely not sure why the MON thinks the OSD is down.

Some questions:
- How does the MON determine if the OSD is down?
- Is there a way to get the MON to report on why an OSD is down, e.g. no heartbeat?
- Is there any need to open ports other than TCP 6789 and 6800-6803?
- Any other suggestions?

ceph 0.94 on Debian Jessie

Best,
Jeff
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux