Re: Cannot stop OSD

Eugen Block <eblock@xxxxxx> · Fri, 07 Feb 2025 10:03:23 +0000

I would try one more thing (locally on that node):

cephadm unit stop --name osd.14

Does that work? If not, you could check the cephadm.log for more  
hints. If it works, you can start the OSD again with:

cephadm unit start --name osd.14

Zitat von Alan Murrell <Alan@xxxxxxxx>:

Is there anything in the OSD logs?

When I issue 'ceph orch daemon stop osd.14', nothing is showing up  
in '  
/var/log/ceph/474264fe-b00e-11ee-b586-ac1f6b0ff21a/ceph-osd.14.log'  
(I was doing a 'tail -f' on the log file at the time I issued to  
daemon stop command and absolutely nothing showed up in it)

when I fiddle with my test clusters, the podman/systemd control  
sometimes breaks, but after a reboot it's usually fine.

I just rebooted the node and all the OSDs came up OK, but osd.14  
will still not stop when I issue the 'ceph orch daemon stop osd.14'  
command.

Does it stop if you simply run 'systemctl stop ceph-{CEPH_FSID}@osd.14'?

That *does* indeed appear to work (and running 'systemctl start  
ceph-{CEPH_FSID}@osd.14' starts it back up).  Using those commands,  
I have been able to get the DB/WAL for osd.14 migrated off.  Weird  
that there appears to be some sort of disconnect with the OSD when  
using the 'ceph orch daemon' command, but as it seems to be just  
this one OSD (for now...), and doesn't *really* impact anything, I  
think I can live with it.

Thanks (again!) for your assistance and guidance.
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx