Re: OSDs are down, don't know why

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Unfortunately, I haven't seen any obvious suspicious log messages from either the OSD or the MON. Is there a way to query detailed information on OSD monitoring, e.g. heartbeats?

On 01/18/2016 05:54 PM, Steve Taylor wrote:
With a single osd there shouldn't be much to worry about. It will have to get caught up on map epochs before it will report itself as up, but on a new cluster that should be pretty immediate.

You'll probably have to look for clues in the osd and mon logs. I would expect some sort of error reported in this scenario. It seems likely that it would be network-related in this case, but the logs will confirm or debunk that theory.

Steve Taylor | Senior Software Engineer | StorageCraft Technology Corporation
380 Data Drive Suite 300 | Draper | Utah | 84020
Office: 801.871.2799 | Fax: 801.545.4705

If you are not the intended recipient of this message, be advised that any dissemination or copying of this message is prohibited.
If you received this message erroneously, please notify the sender and delete it, together with any attachments.


-----Original Message-----
From: Jeff Epstein [mailto:jeff.epstein@xxxxxxxxxxxxxxxx]
Sent: Monday, January 18, 2016 8:32 AM
To: Steve Taylor <steve.taylor@xxxxxxxxxxxxxxxx>; ceph-users <ceph-users@xxxxxxxxxxxxxx>
Subject: Re:  OSDs are down, don't know why

Hi Steve
Thanks for your answer. I don't have a private network defined.
Furthermore, in my current testing configuration, there is only one OSD, so communication between OSDs should be a non-issue.
Do you know how OSD up/down state is determined when there is only one OSD?
Best,
Jeff

On 01/18/2016 03:59 PM, Steve Taylor wrote:
Do you have a ceph private network defined in your config file? I've seen this before in that situation where the private network isn't functional. The osds can talk to the mon(s) but not to each other, so they report each other as down when they're all running just fine.


Steve Taylor | Senior Software Engineer | StorageCraft Technology
Corporation
380 Data Drive Suite 300 | Draper | Utah | 84020
Office: 801.871.2799 | Fax: 801.545.4705

If you are not the intended recipient of this message, be advised that any dissemination or copying of this message is prohibited.
If you received this message erroneously, please notify the sender and delete it, together with any attachments.


-----Original Message-----
From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf
Of Jeff Epstein
Sent: Friday, January 15, 2016 7:28 PM
To: ceph-users <ceph-users@xxxxxxxxxxxxxx>
Subject:  OSDs are down, don't know why

Hello,

I'm setting up a small test instance of ceph and I'm running into a situation where the OSDs are being shown as down, but I don't know why.

Connectivity seems to be working. The OSD hosts are able to communicate with the MON hosts; running "ceph status" and "ceph osd in" from an OSD host works fine, but with a HEALTH_WARN that I have 2 osds: 0 up, 2 in.
Both the OSD and MON daemons seem to be running fine. Network connectivity seems to be okay: I can nc from the OSD to port 6789 on the MON, and from the MON to port 6800-6803 on the OSD (I have constrained the ms bind port min/max config options so that the OSDs will use only these ports). Neither OSD nor MON logs show anything that seems unusual, nor why the OSD is marked as being down.

Furthermore, using tcpdump i've watched network traffic between the OSD and the MON, and it seems that the OSD is sending heartbeats and getting an ack from the MON. So I'm definitely not sure why the MON thinks the OSD is down.

Some questions:
- How does the MON determine if the OSD is down?
- Is there a way to get the MON to report on why an OSD is down, e.g. no heartbeat?
- Is there any need to open ports other than TCP 6789 and 6800-6803?
- Any other suggestions?

ceph 0.94 on Debian Jessie

Best,
Jeff
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux