Re: After power outage, osd do not restart

Igor Fedotov <igor.fedotov@xxxxxxxx> · Thu, 21 Sep 2023 13:44:11 +0300

Hi Patrick,

please share osd restart log to investigate that.

Thanks,

Igor

On 21/09/2023 13:41, Patrick Begou wrote:
Hi,

After a power outage on my test ceph cluster, 2 osd fail to restart.  
The log file show:

8e5f-00266cf8869c@osd.2.service: Failed with result 'timeout'.
Sep 21 11:55:02 mostha1 systemd[1]: Failed to start Ceph osd.2 for 
250f9864-0142-11ee-8e5f-00266cf8869c.
Sep 21 11:55:12 mostha1 systemd[1]: 
ceph-250f9864-0142-11ee-8e5f-00266cf8869c@osd.2.service: Service 
RestartSec=10s expired, scheduling restart.
Sep 21 11:55:12 mostha1 systemd[1]: 
ceph-250f9864-0142-11ee-8e5f-00266cf8869c@osd.2.service: Scheduled 
restart job, restart counter is at 2.
Sep 21 11:55:12 mostha1 systemd[1]: Stopped Ceph osd.2 for 
250f9864-0142-11ee-8e5f-00266cf8869c.
Sep 21 11:55:12 mostha1 systemd[1]: 
ceph-250f9864-0142-11ee-8e5f-00266cf8869c@osd.2.service: Found 
left-over process 1858 (bash) in control group while starting unit. 
Ignoring.
Sep 21 11:55:12 mostha1 systemd[1]: This usually indicates unclean 
termination of a previous run, or service implementation deficiencies.
Sep 21 11:55:12 mostha1 systemd[1]: 
ceph-250f9864-0142-11ee-8e5f-00266cf8869c@osd.2.service: Found 
left-over process 2815 (podman) in control group while starting unit. 
Ignoring.

This is not critical as it is a test cluster and it is actually 
rebalancing on other osd but I would like to know how to return to 
HEALTH_OK status.

Smartctl show the HDD are OK.

So is there a way to recover the osd from this state ? Version is 
15.2.17 (juste moved from 15.2.13 to 15.2.17 yesterday, will try to 
move to latest versions as soon as this problem is solved)

Thanks

Patrick

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx