Hi J-P Methot,
perhaps my response is a bit late but this to some degree recalls me an
issue we've been facing yesterday.
First of all you might want to set debug-osd to 20 for this specific OSD
and see if log would be more helpful. Please share if possible.
Secondly I'm curious if the last reported PG (2.99s3) is always the same
before the crash ? If so you might want to remove it from the OSD using
ceph-objectstore-tool's export-remove command - if our case this helped
to bring OSD up. Exported PG can be loaded to another OSD or (if that's
a single problematic OSD) just thrown away and fixed by scrubbing...
Thanks,
Igor
On 05/04/2023 23:36, J-P Methot wrote:
Hi,
We currently use Ceph Pacific 16.2.10 deployed with Cephadm on this
storage cluster. Last night, one of our OSD died. However, since its
storage is a SSD, we ran hardware checks and found no issue with the
SSD itself. However, if we try starting the service again, the
container just crashes 1 second after booting up. If I look at the
logs, there's no error. You can see the OSD starting up normally and
then the last line before the crash is :
debug 2023-04-05T18:32:57.433+0000 7f8078e0c700 1 osd.87 pg_epoch:
207175 pg[2.99s3( v 207174'218628609
(207134'218623666,207174'218628609] local-lis/les=207140/207141
n=38969 ec=41966/315 lis/c=207140/207049 les/c/f=207141/207050/0
sis=207175 pruub=11.464111328s)
[5,228,217,NONE,17,25,167,114,158,178,159]/[5,228,217,87,17,25,167,114,158,178,159]p5(0)
r=3 lpr=207175 pi=[207049,207175)/1 crt=207174'218628605 mlcod 0'0
remapped NOTIFY pruub 12054.601562500s@ mbc={}] state<Start>:
transitioning to Stray
I don't really see how this line could cause the OSD to crash. Systemd
just writes :
Stopping Ceph osd.83 for (uuid)
What could cause this OSD to boot up and then suddenly die? Outside
the ceph daemon logs and the systemd logs, is there another way I
could gain more information?
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx