Hey all, just want to note that I'm also looking for some kind of way to restart/reset/refresh orchestrator. But in my case it's not the hosts but the services which are presumably wrongly reported and outdated: https://lists.ceph.io/hyperkitty/list/ceph-users@xxxxxxx/thread/NHEVEM3ESJYXZ4LPJ24BBCK6NCG4QRHP/ Don't know if this even can be related. But in case you find a solution, I'll just stick around here and check if I can apply it. Best, Mathias On 6/27/2022 12:33 PM, Thomas Roth wrote: > Hi Adam, > > no, this is the 'feature' where the reboot of a mgr hosts causes all > known hosts to become unmanaged. > > > > # lxbk0375 # ceph cephadm check-host lxbk0374 10.20.2.161 > > mgr.server reply reply (1) Operation not permitted check-host failed: > > Host 'lxbk0374' not found. Use 'ceph orch host ls' to see all > managed hosts. > > In some email on this issue I can't find atm, someone describes a > workaround that allows to restart the entire orchestrator business. > But that sounded risky. > > Regards > Thomsa > > > On 23/06/2022 19.42, Adam King wrote: >> Hi Thomas, >> >> What happens if you run "ceph cephadm check-host <hostname>" for one >> of the >> hosts that is offline (and if that fails "ceph cephadm check-host >> <hostname> <ip-addr>")? Usually, the hosts get marked offline when >> some ssh >> connection to them fails. The check-host command will attempt a >> connection >> and maybe let us see why it's failing, or, if there is no longer an >> issue >> connecting to the host, should mark the host online again. >> >> Thanks, >> - Adam King >> >> On Thu, Jun 23, 2022 at 12:30 PM Thomas Roth <t.roth@xxxxxx> wrote: >> >>> Hi all, >>> >>> found this bug https://tracker.ceph.com/issues/51629 ; (Octopus >>> 15.2.13), >>> reproduced it in Pacific and >>> now again in Quincy: >>> - new cluster >>> - 3 mgr nodes >>> - reboot active mgr node >>> - (only in Quincy:) standby mgr node takes over, rebooted node becomse >>> standby >>> - `ceph orch host ls` shows all hosts as `offline` >>> - add a new host: not offline >>> >>> In my setup, hostnames and IPs are well known, thus >>> >>> # ceph orch host ls >>> HOST ADDR LABELS STATUS >>> lxbk0374 10.20.2.161 _admin Offline >>> lxbk0375 10.20.2.162 Offline >>> lxbk0376 10.20.2.163 Offline >>> lxbk0377 10.20.2.164 Offline >>> lxbk0378 10.20.2.165 Offline >>> lxfs416 10.20.2.178 Offline >>> lxfs417 10.20.2.179 Offline >>> lxfs418 10.20.2.222 Offline >>> lxmds22 10.20.6.67 >>> lxmds23 10.20.6.72 Offline >>> lxmds24 10.20.6.74 Offline >>> >>> >>> (All lxbk are mon nodes, the first 3 are mgr, 'lxmds22' was added after >>> the fatal reboot.) >>> >>> >>> Does this matter at all? >>> The old bug report is one year old, now with prio 'Low'. And some >>> people >>> must have rebooted the one or >>> other host in their clusters... >>> >>> There is a cephfs on our cluster, operations seem to be unaffected. >>> >>> >>> Cheers >>> Thomas >>> >>> -- >>> -------------------------------------------------------------------- >>> Thomas Roth >>> Department: Informationstechnologie >>> Location: SB3 2.291 >>> >>> >>> GSI Helmholtzzentrum für Schwerionenforschung GmbH >>> Planckstraße 1, 64291 Darmstadt, Germany, www.gsi.de >>> >>> Commercial Register / Handelsregister: Amtsgericht Darmstadt, HRB 1528 >>> Managing Directors / Geschäftsführung: >>> Professor Dr. Paolo Giubellino, Dr. Ulrich Breuer, Jörg Blaurock >>> Chairman of the Supervisory Board / Vorsitzender des GSI-Aufsichtsrats: >>> State Secretary / Staatssekretär Dr. Volkmar Dietz >>> >>> _______________________________________________ >>> ceph-users mailing list -- ceph-users@xxxxxxx >>> To unsubscribe send an email to ceph-users-leave@xxxxxxx >>> >> > -- Mathias Kuhring Dr. rer. nat. Bioinformatician HPC & Core Unit Bioinformatics Berlin Institute of Health at Charité (BIH) E-Mail: mathias.kuhring@xxxxxxxxxxxxxx Mobile: +49 172 3475576 _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx