How to find out why osd crashed with cephadm/podman containers?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello,

I have a small 6 nodes Octopus 15.2.11 cluster installed on bare metal with cephadm and I added a second OSD to one of my 3 OSD nodes. I started then copying data to my ceph fs mounted with kernel mount but then both OSDs on that specific nodes crashed.

To this topic I have the following questions:

1) How can I find out why the two OSD crashed? because everything is in podman containers I don't know where are the logs to find out the reason why this happened. From the OS itself everything looks ok, there was no out of memory error.

2) I would assume the two OSD container would restart on their own but this is not the case it looks like. How can I restart manually these 2 OSD containers on that node? I believe this should be a "cephadm orch" command?

The health of the cluster right now is:

    CEPHADM_FAILED_DAEMON: 2 failed cephadm daemon(s)
    PG_DEGRADED: Degraded data redundancy: 132518/397554 objects degraded (33.333%), 65 pgs degraded, 65 pgs undersized

Thank your for your hints.

Best regards,
Mabi
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux