Hi,
if the daemon is up, marking it down will only have a temporary effect:
mon.nautilus2@0(leader).osd e16065 definitely_dead 0
mon.nautilus2@0(leader).osd e16065 do_prune osdmap full prune enabled
log_channel(cluster) log [WRN] : Health check failed: 1 osds down (OSD_DOWN)
mon.nautilus2@0(leader).osd e16066 e16066: 9 total, 8 up, 9 in
log_channel(audit) log [INF] : from='mgr.2794107 IP:0/2317957193'
entity='mgr.nautilus2' cmd='[{"prefix": "osd down", "format": "json",
"ids": ["0"]}]': finished
log_channel(cluster) log [DBG] : osdmap e16066: 9 total, 8 up, 9 in
log_channel(cluster) log [INF] : osd.0 marked itself dead as of e16066
mon.nautilus2@0(leader).osd e16066 do_prune osdmap full prune enabled
log_channel(cluster) log [INF] : Health check cleared: OSD_DOWN (was:
1 osds down)
log_channel(cluster) log [INF] : Cluster is now healthy
What's the output of
ceph osd ok-to-stop 14
If it should be safe to stop it, inspect the mgr log why it refuses to
stop it. You can always fail the mgr (ceph mgr fail) and retry.
Zitat von Alan Murrell <Alan@xxxxxxxx>:
Hello,
I am running a Ceph cluster installed with cephadm. Version is
18.2.2 reef (stable).
I am moving the DB/WAL from my HDDs to SSD, and have been doing fine
on all the OSDs until I got to one in particular (osd.14)
From the cephadm shell, when I run 'ceph orch daemon stop osd.14',
nothing happens: it does not get marked as Down. If I mark is as
Down in the GUI, it does show as Down then a few seconds later it
gets marks as Up again.
I was running 'journalctl -xf | grep osd.14' to see if any errors
came up, but nothing did.
Not sure where to check next to try to sort this out?
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx