Re: ceph octopus mysterious OSD crash

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 3/18/21 9:28 PM, Philip Brown wrote:
I've been banging on my ceph octopus test cluster for a few days now.
8 nodes. each node has 2 SSDs and 8 HDDs.
They were all autoprovisioned so that each HDD gets an LVM slice of an SSD as a db partition.

service_type: osd
service_id: osd_spec_default
placement:
   host_pattern: '*'
data_devices:
   rotational: 1
db_devices:
   rotational: 0


things were going pretty good, until... yesterday.. i noticed TWO of the OSDs were "down".

I went to check the logs, with
journalctl -u ceph-xxxx@xxxxxxx

all it showed were a bunch of generic debug info, and the fact that it stopped.
and various automatic attempts to restart.
but no indication of what was wrong, and why the restarts KEEP failing.


It's a deployment made with cephadm? Looks like it as I see podman messages. Are these all the log messages you can find on those OSDs? I.e. have you tried to gather logs with cephadm logs [1].

Gr. Stefan

[1]: https://docs.ceph.com/en/latest/cephadm/troubleshooting/#gathering-log-files
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux