yup cephadm and orch was used to set all this up. Current state of things: ceph osd tree shows 33 hdd 1.84698 osd.33 destroyed 0 1.00000 cephadm logs --name osd.33 --fsid xx-xx-xx-xx along with the systemctl stuff I already saw, showed me new things such as ceph-osd[1645438]: did not load config file, using default settings. ceph-osd[1645438]: 2021-03-18T14:31:32.990-0700 7f8bf14e3bc0 -1 parse_file: filesystem error: cannot get file size: No such file or directory This suggested to me that I needed to copy over /etc/ceph/ceph.conf to the OSD node. which I did. I then also copied over the admin key and generated a fresh bootstrap-osd key with it, just for good measure, with ceph auth get client.bootstrap-osd > /var/lib/ceph/bootstrap-osd/ceph.keyring I had saved the previous output of ceph-volume lvm list and on the OSD node, ran ceph-volume lvm prepare --data xxxx --block.db xxxx But it says osd is already prepared. I tried an activate... it tells me --> ceph-volume lvm activate successful for osd ID: 33 but now the cephadm logs output shows me ceph-osd[1677135]: 2021-03-18T17:57:47.982-0700 7ff64593f700 -1 monclient(hunting): handle_auth_bad_method server allowed_methods [2] but i only support [2] Not the best error message :-} Now what do I need to do? ----- Original Message ----- From: "Stefan Kooman" <stefan@xxxxxx> To: "Philip Brown" <pbrown@xxxxxxxxxx>, "ceph-users" <ceph-users@xxxxxxx> Sent: Thursday, March 18, 2021 2:04:09 PM Subject: Re: ceph octopus mysterious OSD crash On 3/18/21 9:28 PM, Philip Brown wrote: > I've been banging on my ceph octopus test cluster for a few days now. > 8 nodes. each node has 2 SSDs and 8 HDDs. > They were all autoprovisioned so that each HDD gets an LVM slice of an SSD as a db partition. > > service_type: osd > service_id: osd_spec_default > placement: > host_pattern: '*' > data_devices: > rotational: 1 > db_devices: > rotational: 0 > > > things were going pretty good, until... yesterday.. i noticed TWO of the OSDs were "down". > > I went to check the logs, with > journalctl -u ceph-xxxx@xxxxxxx > > all it showed were a bunch of generic debug info, and the fact that it stopped. > and various automatic attempts to restart. > but no indication of what was wrong, and why the restarts KEEP failing. > It's a deployment made with cephadm? Looks like it as I see podman messages. Are these all the log messages you can find on those OSDs? I.e. have you tried to gather logs with cephadm logs [1]. Gr. Stefan [1]: https://docs.ceph.com/en/latest/cephadm/troubleshooting/#gathering-log-files _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx