Mysteriously dead OSD process

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,


We currently use Ceph Pacific 16.2.10 deployed with Cephadm on this storage cluster. Last night, one of our OSD died. However, since its storage is a SSD, we ran hardware checks and found no issue with the SSD itself. However, if we try starting the service again, the container just crashes 1 second after booting up. If I look at the logs, there's no error. You can see the OSD starting up normally and then the last line before the crash is :

debug 2023-04-05T18:32:57.433+0000 7f8078e0c700  1 osd.87 pg_epoch: 207175 pg[2.99s3( v 207174'218628609 (207134'218623666,207174'218628609] local-lis/les=207140/207141 n=38969 ec=41966/315 lis/c=207140/207049 les/c/f=207141/207050/0 sis=207175 pruub=11.464111328s) [5,228,217,NONE,17,25,167,114,158,178,159]/[5,228,217,87,17,25,167,114,158,178,159]p5(0) r=3 lpr=207175 pi=[207049,207175)/1 crt=207174'218628605 mlcod 0'0 remapped NOTIFY pruub 12054.601562500s@ mbc={}] state<Start>: transitioning to Stray

I don't really see how this line could cause the OSD to crash. Systemd just writes :

Stopping Ceph osd.83 for (uuid)

What could cause this OSD to boot up and then suddenly die? Outside the ceph daemon logs and the systemd logs, is there another way I could gain more information?

--
Jean-Philippe Méthot
Senior Openstack system administrator
Administrateur système Openstack sénior
PlanetHoster inc.
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux