Re: [ext] Re: cephadm orch thinks hosts are offline

"Kuhring, Mathias" <mathias.kuhring@xxxxxxxxxxxxxx> · Wed, 29 Jun 2022 16:00:00 +0000

Hey all,

just want to note that I'm also looking for some kind of way to 
restart/reset/refresh orchestrator.
But in my case it's not the hosts but the services which are presumably 
wrongly reported and outdated:
https://lists.ceph.io/hyperkitty/list/ceph-users@xxxxxxx/thread/NHEVEM3ESJYXZ4LPJ24BBCK6NCG4QRHP/

Don't know if this even can be related.
But in case you find a solution, I'll just stick around here and check 
if I can apply it.

Best,
Mathias

On 6/27/2022 12:33 PM, Thomas Roth wrote:
> Hi Adam,
>
> no, this is the 'feature' where the reboot of a mgr hosts causes all 
> known hosts to become unmanaged.
>
>
> > # lxbk0375 # ceph cephadm check-host lxbk0374 10.20.2.161
> > mgr.server reply reply (1) Operation not permitted check-host failed:
> > Host 'lxbk0374' not found. Use 'ceph orch host ls' to see all 
> managed hosts.
>
> In some email on this issue I can't find atm, someone describes a 
> workaround that allows to restart the entire orchestrator business.
> But that sounded risky.
>
> Regards
> Thomsa
>
>
> On 23/06/2022 19.42, Adam King wrote:
>> Hi Thomas,
>>
>> What happens if you run "ceph cephadm check-host <hostname>" for one 
>> of the
>> hosts that is offline (and if that fails "ceph cephadm check-host
>> <hostname> <ip-addr>")? Usually, the hosts get marked offline when 
>> some ssh
>> connection to them fails. The check-host command will attempt a 
>> connection
>> and maybe let us see why it's failing, or, if there is no longer an 
>> issue
>> connecting to the host, should mark the host online again.
>>
>> Thanks,
>>    - Adam King
>>
>> On Thu, Jun 23, 2022 at 12:30 PM Thomas Roth <t.roth@xxxxxx> wrote:
>>
>>> Hi all,
>>>
>>> found this bug https://tracker.ceph.com/issues/51629 ; (Octopus 
>>> 15.2.13),
>>> reproduced it in Pacific and
>>> now again in Quincy:
>>> - new cluster
>>> - 3 mgr nodes
>>> - reboot active mgr node
>>> - (only in Quincy:) standby mgr node takes over, rebooted node becomse
>>> standby
>>> - `ceph orch host ls` shows all hosts as `offline`
>>> - add a new host: not offline
>>>
>>> In my setup, hostnames and IPs are well known, thus
>>>
>>> # ceph orch host ls
>>> HOST      ADDR         LABELS  STATUS
>>> lxbk0374  10.20.2.161  _admin  Offline
>>> lxbk0375  10.20.2.162          Offline
>>> lxbk0376  10.20.2.163          Offline
>>> lxbk0377  10.20.2.164          Offline
>>> lxbk0378  10.20.2.165          Offline
>>> lxfs416   10.20.2.178          Offline
>>> lxfs417   10.20.2.179          Offline
>>> lxfs418   10.20.2.222          Offline
>>> lxmds22   10.20.6.67
>>> lxmds23   10.20.6.72           Offline
>>> lxmds24   10.20.6.74           Offline
>>>
>>>
>>> (All lxbk are mon nodes, the first 3 are mgr, 'lxmds22' was added after
>>> the fatal reboot.)
>>>
>>>
>>> Does this matter at all?
>>> The old bug report is one year old, now with prio 'Low'. And some 
>>> people
>>> must have rebooted the one or
>>> other host in their clusters...
>>>
>>> There is a cephfs on our cluster, operations seem to be unaffected.
>>>
>>>
>>> Cheers
>>> Thomas
>>>
>>> -- 
>>> --------------------------------------------------------------------
>>> Thomas Roth
>>> Department: Informationstechnologie
>>> Location: SB3 2.291
>>>
>>>
>>> GSI Helmholtzzentrum für Schwerionenforschung GmbH
>>> Planckstraße 1, 64291 Darmstadt, Germany, www.gsi.de
>>>
>>> Commercial Register / Handelsregister: Amtsgericht Darmstadt, HRB 1528
>>> Managing Directors / Geschäftsführung:
>>> Professor Dr. Paolo Giubellino, Dr. Ulrich Breuer, Jörg Blaurock
>>> Chairman of the Supervisory Board / Vorsitzender des GSI-Aufsichtsrats:
>>> State Secretary / Staatssekretär Dr. Volkmar Dietz
>>>
>>> _______________________________________________
>>> ceph-users mailing list -- ceph-users@xxxxxxx
>>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>>>
>>
>
-- 
Mathias Kuhring

Dr. rer. nat.
Bioinformatician
HPC & Core Unit Bioinformatics
Berlin Institute of Health at Charité (BIH)

E-Mail:  mathias.kuhring@xxxxxxxxxxxxxx
Mobile: +49 172 3475576

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx