Hello,
I'm setting up a small test instance of ceph and I'm running into a
situation where the OSDs are being shown as down, but I don't know why.
Connectivity seems to be working. The OSD hosts are able to communicate
with the MON hosts; running "ceph status" and "ceph osd in" from an OSD
host works fine, but with a HEALTH_WARN that I have 2 osds: 0 up, 2 in.
Both the OSD and MON daemons seem to be running fine. Network
connectivity seems to be okay: I can nc from the OSD to port 6789 on the
MON, and from the MON to port 6800-6803 on the OSD (I have constrained
the ms bind port min/max config options so that the OSDs will use only
these ports). Neither OSD nor MON logs show anything that seems unusual,
nor why the OSD is marked as being down.
Furthermore, using tcpdump i've watched network traffic between the OSD
and the MON, and it seems that the OSD is sending heartbeats and getting
an ack from the MON. So I'm definitely not sure why the MON thinks the
OSD is down.
Some questions:
- How does the MON determine if the OSD is down?
- Is there a way to get the MON to report on why an OSD is down, e.g. no
heartbeat?
- Is there any need to open ports other than TCP 6789 and 6800-6803?
- Any other suggestions?
ceph 0.94 on Debian Jessie
Best,
Jeff
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com