Re: ceph orch cannot refresh

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

have you tried a mgr failover? 'ceph mgr fail' should do the trick, because restarting a mgr daemon won't fail it over. You should be able to see hints in the active mgr logs what is failing, e.g. cephadm logs --name mgr.<MGR>.

Zitat von Nicola Mori <mori@xxxxxxxxxx>:

Dear Ceph users,

after a host failure in my cluster (quincy 17.2.3 managed by cephadm) it seems that ceph orch got somehow stuck and it cannot operate. For example, it seems that it cannot refresh the status of several services since about 20 hours:

# ceph orch ls
NAME                       PORTS        RUNNING  REFRESHED   AGE PLACEMENT
alertmanager               ?:9093,9094      1/1  3m ago      3M count:1
crash 9/10 20h ago 3M * grafana ?:3000 1/1 3m ago 3M count:1 mds.wizard_fs 0/3 <deleting> 13h bofur;balin;aka;count:3 mds.wizardfs 2/3 20h ago 70m bofur;balin;aka;count:3 mgr 2/2 20h ago 15m bofur;balin;count:2 mon 4/5 20h ago 93m bofur;balin;aka;romolo;dwalin;count:5 node-exporter ?:9100 9/10 20h ago 3M * osd 24 3m ago - <unmanaged> osd.all-available-devices 72 20h ago 4w * prometheus ?:9095 1/1 3m ago 3M count:1

The failed machine (named bifur) is offline but still in the cluster since I'm planning to restore it:

# ceph orch host ls
HOST     ADDR           LABELS               STATUS
aka      172.16.253.7   _admin
balin    172.16.253.3
bifur    172.16.253.5   _admin               Offline
bofur    172.16.253.2   _admin
dwalin   172.16.253.10
ogion    172.16.253.6   _no_autotune_memory
prestno  172.16.253.9
remolo   172.16.253.1
rokanan  172.16.253.8
romolo   172.16.253.4
10 hosts in cluster

Since this machine hosted a mon I tried to redeploy it with:

# ceph orch apply mon --placement="5 bofur balin aka romolo dwalin"

but even if ceph orch ls shows that the mons should currently be on the machines specified buy --placement (see above) it seems that somehow the mon on bifur is somehow still present in ceph orch status, e.g.

# ceph orch restart mon
Scheduled to restart mon.aka on host 'aka'
Scheduled to restart mon.balin on host 'balin'
Scheduled to restart mon.bifur on host 'bifur'
Scheduled to restart mon.bofur on host 'bofur'
Scheduled to restart mon.romolo on host 'romolo'

I manually restarted all the mon and mgr daemons on online hosts to no avail. At this point I am clueless, so any help is greatly appreciated.

Nicola


_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux