Re: [ext] Re: cephadm orch thinks hosts are offline

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



We found a fix for our issue ceph orch reporting wrong/outdated service 
information:
https://lists.ceph.io/hyperkitty/list/ceph-users@xxxxxxx/message/DAFXD46NALFAFUBQEYODRIFWSD6SH2OL/

In our case our DNS settings were messed up on the cluster hosts AND 
also within the MGR daemon containers (cephadm deployed).
Not sure, but I could imaging this could also mess with proper host 
detection.
So, I guess it's worth it to at least confirm the settings on 
/etc/resolv.conf on all your hosts and MGR containers.

Best, Mathias

On 6/29/2022 5:59 PM, Mathias Kuhring wrote:
> Hey all,
>
> just want to note that I'm also looking for some kind of way to 
> restart/reset/refresh orchestrator.
> But in my case it's not the hosts but the services which are 
> presumably wrongly reported and outdated:
> https://lists.ceph.io/hyperkitty/list/ceph-users@xxxxxxx/thread/NHEVEM3ESJYXZ4LPJ24BBCK6NCG4QRHP/ 
>
>
> Don't know if this even can be related.
> But in case you find a solution, I'll just stick around here and check 
> if I can apply it.
>
> Best,
> Mathias
>
> On 6/27/2022 12:33 PM, Thomas Roth wrote:
>> Hi Adam,
>>
>> no, this is the 'feature' where the reboot of a mgr hosts causes all 
>> known hosts to become unmanaged.
>>
>>
>> > # lxbk0375 # ceph cephadm check-host lxbk0374 10.20.2.161
>> > mgr.server reply reply (1) Operation not permitted check-host failed:
>> > Host 'lxbk0374' not found. Use 'ceph orch host ls' to see all 
>> managed hosts.
>>
>> In some email on this issue I can't find atm, someone describes a 
>> workaround that allows to restart the entire orchestrator business.
>> But that sounded risky.
>>
>> Regards
>> Thomsa
>>
>>
>> On 23/06/2022 19.42, Adam King wrote:
>>> Hi Thomas,
>>>
>>> What happens if you run "ceph cephadm check-host <hostname>" for one 
>>> of the
>>> hosts that is offline (and if that fails "ceph cephadm check-host
>>> <hostname> <ip-addr>")? Usually, the hosts get marked offline when 
>>> some ssh
>>> connection to them fails. The check-host command will attempt a 
>>> connection
>>> and maybe let us see why it's failing, or, if there is no longer an 
>>> issue
>>> connecting to the host, should mark the host online again.
>>>
>>> Thanks,
>>>    - Adam King
>>>
>>> On Thu, Jun 23, 2022 at 12:30 PM Thomas Roth <t.roth@xxxxxx> wrote:
>>>
>>>> Hi all,
>>>>
>>>> found this bug https://tracker.ceph.com/issues/51629 (Octopus 
>>>> 15.2.13),
>>>> reproduced it in Pacific and
>>>> now again in Quincy:
>>>> - new cluster
>>>> - 3 mgr nodes
>>>> - reboot active mgr node
>>>> - (only in Quincy:) standby mgr node takes over, rebooted node becomse
>>>> standby
>>>> - `ceph orch host ls` shows all hosts as `offline`
>>>> - add a new host: not offline
>>>>
>>>> In my setup, hostnames and IPs are well known, thus
>>>>
>>>> # ceph orch host ls
>>>> HOST      ADDR         LABELS  STATUS
>>>> lxbk0374  10.20.2.161  _admin  Offline
>>>> lxbk0375  10.20.2.162          Offline
>>>> lxbk0376  10.20.2.163          Offline
>>>> lxbk0377  10.20.2.164          Offline
>>>> lxbk0378  10.20.2.165          Offline
>>>> lxfs416   10.20.2.178          Offline
>>>> lxfs417   10.20.2.179          Offline
>>>> lxfs418   10.20.2.222          Offline
>>>> lxmds22   10.20.6.67
>>>> lxmds23   10.20.6.72           Offline
>>>> lxmds24   10.20.6.74           Offline
>>>>
>>>>
>>>> (All lxbk are mon nodes, the first 3 are mgr, 'lxmds22' was added 
>>>> after
>>>> the fatal reboot.)
>>>>
>>>>
>>>> Does this matter at all?
>>>> The old bug report is one year old, now with prio 'Low'. And some 
>>>> people
>>>> must have rebooted the one or
>>>> other host in their clusters...
>>>>
>>>> There is a cephfs on our cluster, operations seem to be unaffected.
>>>>
>>>>
>>>> Cheers
>>>> Thomas
>>>>
>>>> -- 
>>>> --------------------------------------------------------------------
>>>> Thomas Roth
>>>> Department: Informationstechnologie
>>>> Location: SB3 2.291
>>>>
>>>>
>>>> GSI Helmholtzzentrum für Schwerionenforschung GmbH
>>>> Planckstraße 1, 64291 Darmstadt, Germany, www.gsi.de
>>>>
>>>> Commercial Register / Handelsregister: Amtsgericht Darmstadt, HRB 1528
>>>> Managing Directors / Geschäftsführung:
>>>> Professor Dr. Paolo Giubellino, Dr. Ulrich Breuer, Jörg Blaurock
>>>> Chairman of the Supervisory Board / Vorsitzender des 
>>>> GSI-Aufsichtsrats:
>>>> State Secretary / Staatssekretär Dr. Volkmar Dietz
>>>>
>>>> _______________________________________________
>>>> ceph-users mailing list -- ceph-users@xxxxxxx
>>>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>>>>
>>>
>>
-- 
Mathias Kuhring

Dr. rer. nat.
Bioinformatician
HPC & Core Unit Bioinformatics
Berlin Institute of Health at Charité (BIH)

E-Mail:  mathias.kuhring@xxxxxxxxxxxxxx
Mobile: +49 172 3475576

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux