Re: cephadm orch thinks hosts are offline

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Adam,

no, this is the 'feature' where the reboot of a mgr hosts causes all known hosts to become unmanaged.


> # lxbk0375 # ceph cephadm check-host lxbk0374 10.20.2.161
> mgr.server reply reply (1) Operation not permitted check-host failed:
> Host 'lxbk0374' not found. Use 'ceph orch host ls' to see all managed hosts.

In some email on this issue I can't find atm, someone describes a workaround that allows to restart the entire orchestrator business.
But that sounded risky.

Regards
Thomsa


On 23/06/2022 19.42, Adam King wrote:
Hi Thomas,

What happens if you run "ceph cephadm check-host <hostname>" for one of the
hosts that is offline (and if that fails "ceph cephadm check-host
<hostname> <ip-addr>")? Usually, the hosts get marked offline when some ssh
connection to them fails. The check-host command will attempt a connection
and maybe let us see why it's failing, or, if there is no longer an issue
connecting to the host, should mark the host online again.

Thanks,
   - Adam King

On Thu, Jun 23, 2022 at 12:30 PM Thomas Roth <t.roth@xxxxxx> wrote:

Hi all,

found this bug https://tracker.ceph.com/issues/51629  (Octopus 15.2.13),
reproduced it in Pacific and
now again in Quincy:
- new cluster
- 3 mgr nodes
- reboot active mgr node
- (only in Quincy:) standby mgr node takes over, rebooted node becomse
standby
- `ceph orch host ls` shows all hosts as `offline`
- add a new host: not offline

In my setup, hostnames and IPs are well known, thus

# ceph orch host ls
HOST      ADDR         LABELS  STATUS
lxbk0374  10.20.2.161  _admin  Offline
lxbk0375  10.20.2.162          Offline
lxbk0376  10.20.2.163          Offline
lxbk0377  10.20.2.164          Offline
lxbk0378  10.20.2.165          Offline
lxfs416   10.20.2.178          Offline
lxfs417   10.20.2.179          Offline
lxfs418   10.20.2.222          Offline
lxmds22   10.20.6.67
lxmds23   10.20.6.72           Offline
lxmds24   10.20.6.74           Offline


(All lxbk are mon nodes, the first 3 are mgr, 'lxmds22' was added after
the fatal reboot.)


Does this matter at all?
The old bug report is one year old, now with prio 'Low'. And some people
must have rebooted the one or
other host in their clusters...

There is a cephfs on our cluster, operations seem to be unaffected.


Cheers
Thomas

--
--------------------------------------------------------------------
Thomas Roth
Department: Informationstechnologie
Location: SB3 2.291


GSI Helmholtzzentrum für Schwerionenforschung GmbH
Planckstraße 1, 64291 Darmstadt, Germany, www.gsi.de

Commercial Register / Handelsregister: Amtsgericht Darmstadt, HRB 1528
Managing Directors / Geschäftsführung:
Professor Dr. Paolo Giubellino, Dr. Ulrich Breuer, Jörg Blaurock
Chairman of the Supervisory Board / Vorsitzender des GSI-Aufsichtsrats:
State Secretary / Staatssekretär Dr. Volkmar Dietz

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



--
--------------------------------------------------------------------
Thomas Roth
Department: Informationstechnologie
Location: SB3 2.291
Phone: +49-6159-71 1453  Fax: +49-6159-71 2986


GSI Helmholtzzentrum für Schwerionenforschung GmbH
Planckstraße 1, 64291 Darmstadt, Germany, www.gsi.de

Commercial Register / Handelsregister: Amtsgericht Darmstadt, HRB 1528
Managing Directors / Geschäftsführung:
Professor Dr. Paolo Giubellino, Dr. Ulrich Breuer, Jörg Blaurock
Chairman of the Supervisory Board / Vorsitzender des GSI-Aufsichtsrats:
State Secretary / Staatssekretär Dr. Volkmar Dietz

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux