Hi,
We currently use Ceph Pacific 16.2.10 deployed with Cephadm on this
storage cluster. Last night, one of our OSD died. However, since its
storage is a SSD, we ran hardware checks and found no issue with the SSD
itself. However, if we try starting the service again, the container
just crashes 1 second after booting up. If I look at the logs, there's
no error. You can see the OSD starting up normally and then the last
line before the crash is :
debug 2023-04-05T18:32:57.433+0000 7f8078e0c700 1 osd.87 pg_epoch:
207175 pg[2.99s3( v 207174'218628609 (207134'218623666,207174'218628609]
local-lis/les=207140/207141 n=38969 ec=41966/315 lis/c=207140/207049
les/c/f=207141/207050/0 sis=207175 pruub=11.464111328s)
[5,228,217,NONE,17,25,167,114,158,178,159]/[5,228,217,87,17,25,167,114,158,178,159]p5(0)
r=3 lpr=207175 pi=[207049,207175)/1 crt=207174'218628605 mlcod 0'0
remapped NOTIFY pruub 12054.601562500s@ mbc={}] state<Start>:
transitioning to Stray
I don't really see how this line could cause the OSD to crash. Systemd
just writes :
Stopping Ceph osd.83 for (uuid)
What could cause this OSD to boot up and then suddenly die? Outside the
ceph daemon logs and the systemd logs, is there another way I could gain
more information?
--
Jean-Philippe Méthot
Senior Openstack system administrator
Administrateur système Openstack sénior
PlanetHoster inc.
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx