failed OSD daemon

Magnus Hagdorn <Magnus.Hagdorn@xxxxxxxx> · Mon, 25 Jul 2022 15:55:18 +0000

Hi there,
on our pacific (16.2.9) cluster one of the OSD daemons has died and
fails to restart. The OSD exposes a NVMe drive and is one of 4
identical machines. We are using podman to orchestrate the ceph
daemons. The underlying OS is managed. The system worked fine without
any issues until recently, the other 3 machines are still working fine.
No errors are reported by the NVMe drive. The systemd unit fails after
restart, I have rebooted the system which didn't help. We end up with
an awful lot of stuff in the log which is difficult to sift through.
The OSD is part of pool with replication level 5 containing the
metadata for a cephfs.
Any suggestion what to look for?
Cheers
magnus
The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. Is e buidheann carthannais a th’ ann an Oilthigh Dhùn Èideann, clàraichte an Alba, àireamh clàraidh SC005336.
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx