Re: ceph octopus mysterious OSD crash

Stefan Kooman <stefan@xxxxxx> · Thu, 18 Mar 2021 22:04:09 +0100

On 3/18/21 9:28 PM, Philip Brown wrote:
I've been banging on my ceph octopus test cluster for a few days now.
8 nodes. each node has 2 SSDs and 8 HDDs.
They were all autoprovisioned so that each HDD gets an LVM slice of an SSD as a db partition.

service_type: osd
service_id: osd_spec_default
placement:
   host_pattern: '*'
data_devices:
   rotational: 1
db_devices:
   rotational: 0

things were going pretty good, until... yesterday.. i noticed TWO of the OSDs were "down".

I went to check the logs, with
journalctl -u ceph-xxxx@xxxxxxx

all it showed were a bunch of generic debug info, and the fact that it stopped.
and various automatic attempts to restart.
but no indication of what was wrong, and why the restarts KEEP failing.

It's a deployment made with cephadm? Looks like it as I see podman 
messages. Are these all the log messages you can find on those OSDs? 
I.e. have you tried to gather logs with cephadm logs [1].

Gr. Stefan

[1]: 
https://docs.ceph.com/en/latest/cephadm/troubleshooting/#gathering-log-files
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx