Hi,
have you tried a mgr failover? 'ceph mgr fail' should do the trick,
because restarting a mgr daemon won't fail it over. You should be able
to see hints in the active mgr logs what is failing, e.g. cephadm logs
--name mgr.<MGR>.
Zitat von Nicola Mori <mori@xxxxxxxxxx>:
Dear Ceph users,
after a host failure in my cluster (quincy 17.2.3 managed by
cephadm) it seems that ceph orch got somehow stuck and it cannot
operate. For example, it seems that it cannot refresh the status of
several services since about 20 hours:
# ceph orch ls
NAME PORTS RUNNING REFRESHED AGE PLACEMENT
alertmanager ?:9093,9094 1/1 3m ago 3M count:1
crash 9/10 20h ago 3M *
grafana ?:3000 1/1 3m ago 3M
count:1
mds.wizard_fs 0/3 <deleting> 13h
bofur;balin;aka;count:3
mds.wizardfs 2/3 20h ago 70m
bofur;balin;aka;count:3
mgr 2/2 20h ago 15m
bofur;balin;count:2
mon 4/5 20h ago 93m
bofur;balin;aka;romolo;dwalin;count:5
node-exporter ?:9100 9/10 20h ago 3M *
osd 24 3m ago -
<unmanaged>
osd.all-available-devices 72 20h ago 4w *
prometheus ?:9095 1/1 3m ago 3M
count:1
The failed machine (named bifur) is offline but still in the cluster
since I'm planning to restore it:
# ceph orch host ls
HOST ADDR LABELS STATUS
aka 172.16.253.7 _admin
balin 172.16.253.3
bifur 172.16.253.5 _admin Offline
bofur 172.16.253.2 _admin
dwalin 172.16.253.10
ogion 172.16.253.6 _no_autotune_memory
prestno 172.16.253.9
remolo 172.16.253.1
rokanan 172.16.253.8
romolo 172.16.253.4
10 hosts in cluster
Since this machine hosted a mon I tried to redeploy it with:
# ceph orch apply mon --placement="5 bofur balin aka romolo dwalin"
but even if ceph orch ls shows that the mons should currently be on
the machines specified buy --placement (see above) it seems that
somehow the mon on bifur is somehow still present in ceph orch
status, e.g.
# ceph orch restart mon
Scheduled to restart mon.aka on host 'aka'
Scheduled to restart mon.balin on host 'balin'
Scheduled to restart mon.bifur on host 'bifur'
Scheduled to restart mon.bofur on host 'bofur'
Scheduled to restart mon.romolo on host 'romolo'
I manually restarted all the mon and mgr daemons on online hosts to
no avail. At this point I am clueless, so any help is greatly
appreciated.
Nicola
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx