OSDs missing from cluster all from one node

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Yesterday I noticed some OSDs were missing from our cluster (96 OSDs total, 84up/84in is what showed).

After drilling down to determine which node and the cause, I found that all the OSDs on that node (12 total) were in fact down.

I entered 'systemctl status ceph-osd@$osd_number' to determine exactly why they were down, and came up with:
Fail to open '/proc/0/cmdline' error = (2) No such file or directory
received  signal: Interrupt from  PID: 0 task name: <unknown> UID: 0
osd.72 1067 *** Got signal Interrupt ***
osd.72 1067 shutdown

This happened on all twelve OSDs (osd.72-osd.83).  On four, it happened the previous evening around 9pm EST and the other eight happened at roughly 2am EST the morning I discovered the issue (discovered around 9am EST).

Has anyone ever come across something like this or perhaps know of a fix?  This hasn't happened since, but this being a newly built-out cluster it was a bit concerning.

Thanks in advance.
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux