Re: Issue adding host with cephadm - nothing is deployed

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Adam,

Thanks for the ideas! I tried all the things you mentioned but in short - cephadm isn't removing services either, but everything else works.

I changed the node-exporter specification (on the dashboard) to only be placed on three of the old, working hosts. I see in the cephadm logs:
[INF] Saving service node-exporter spec with placement host01, host02, host03
[DBG] _kick_serve_loop

But nothing else happens. I restarted the active mgr and I can see the hosts being re-inventoried, then again nothing. The node-exporter size is 3, and running remains 4.

For the other checks

  *   ceph orch host ls - all 8 hosts show
  *   ceph orch device ls - all disks on all hosts show, including the four new hosts.
  *   I ran cephadm shell -- ceph-volume inventory on all four new hosts, no errors or hangings
  *   ceph orch ls --format yaml -> matches the dashboard view (i.e. shows node-exporter with placement on hosts 01, 02, 03 with size: 3 and running: 3)

I'm running ceph -W cephadm with log_to_cluster_level set to debug, but except for the walls of text with the inventories, nothing (except _kick_service_loop) shows up in the logs after the INF level messages that host has been added or service specification has been saved.

Best


________________________________
From: Adam King 'adking at redhat.com'
Sent: 18 August 2022 16:09
To: ceph-mail@xxxxxxxxxxxxxxxx <ceph-mail@xxxxxxxxxxxxxxxx>
Subject: Re:  Issue adding host with cephadm - nothing is deployed

If you try shuffling some daemon around on some of the working hosts (e.g. changing the placement of the node-exporter spec so that one of the working hosts is excluded so the node-exporter there should be removed) is cephadm able to actually complete that? Also, does device info for any or all of these hosts show up in `ceph orch device ls`? I know there's been an issue people have run into occasionally where ceph-volume inventory (which cephadm uses to gather device info) was hanging on one of the hosts and it was causing things to get stuck until the host was removed or whatever caused the hang was fixed. There could also be something interesting/useful in the output of `ceph log last 200 cephadm`, `ceph orch host ls` and `ceph orch ls --format yaml`. Some traceback or even just seeing what the last thing logged was could be useful.

On Thu, Aug 18, 2022 at 9:31 AM <ceph-mail@xxxxxxxxxxxxxxxx<mailto:ceph-mail@xxxxxxxxxxxxxxxx>> wrote:
Hi again all,

I have a new issue with ceph/cephadm/quincy and hopefully someone can assist.

I have a cluster of four hosts, that I (finally) managed to bootstrap. I'm now trying to add several additional hosts. Whether I add the hosts from the dashboard or CLI, I get the same result - the host is added but no services are deployed.

I have tried

  *   confirming that the ssh connection works
  *   enabled debug logging of cephadm and I'm watching the output
  *   when adding the host, cephadm logs in to the host and runs check-host
     *   the debug output show error on podman 3.0.1 but it's the same version as on all the working hosts
     *   it confirms systemctl, lvcreate, chrony.service, hostname and concludes Host looks OK
     *   it outputs Added host
  *   The host is visible in the list of hosts in the dashboard and ceph orch host ls, but services are deployed so there is no data under model, CPUs etc
  *   I have tried to edit and save the node-exporter and crash services (both have * for placement)
  *   I have tried to redeploy the node-exporter service, they just get redeployed to the existing four hosts
  *   I have tried ceph orch pause and then ceph orch resume
  *   I have tried to put the host in maintenance mode and exit the maintenance mode
  *   I tried to restart all the mons and mgrs - when restarting the active mgr, cephadm finally did the inventory of the host I added. When I added the additional hosts, nothing happened until I restarted the mgr again. The service size for crash and node-exporter increased to 8, but still no services are deployed and the running number remains at 4.

I could see no error anywhere (except the note about podman 3.0.1) and I didn't have this problem when adding the first three hosts after the bootstrap.

Ideas?

Thanks

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx<mailto:ceph-users@xxxxxxx>
To unsubscribe send an email to ceph-users-leave@xxxxxxx<mailto:ceph-users-leave@xxxxxxx>


_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux