Re: Issue adding host with cephadm - nothing is deployed

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Okay, the fact that the removal is also not working means that idea of it
being "stuck" in some way is likely correct. The most likely culprit in
these scenarios in the past, as mentioned previously, are hanging
ceph-volume commands. Maybe going to each of these new hosts and running
something like `ps aux | grep ceph-volume` and see if there's any processes
that have been around a while. Especially if any of those processes are in
D state. If you see something like that it means there's probably something
going on with the devices or mount points on that host that is
causing ceph-volume to hang when checking them. Often a reboot fixes it in
this case.

On Thu, Aug 18, 2022 at 11:12 AM <ceph-mail@xxxxxxxxxxxxxxxx> wrote:

> Hi Adam,
>
> Thanks for the ideas! I tried all the things you mentioned but in short -
> cephadm isn't removing services either, but everything else works.
>
> I changed the node-exporter specification (on the dashboard) to only be
> placed on three of the old, working hosts. I see in the cephadm logs:
> [INF] Saving service node-exporter spec with placement host01, host02,
> host03
> [DBG] _kick_serve_loop
>
> But nothing else happens. I restarted the active mgr and I can see the
> hosts being re-inventoried, then again nothing. The node-exporter size is
> 3, and running remains 4.
>
> For the other checks
>
>   *   ceph orch host ls - all 8 hosts show
>   *   ceph orch device ls - all disks on all hosts show, including the
> four new hosts.
>   *   I ran cephadm shell -- ceph-volume inventory on all four new hosts,
> no errors or hangings
>   *   ceph orch ls --format yaml -> matches the dashboard view (i.e. shows
> node-exporter with placement on hosts 01, 02, 03 with size: 3 and running:
> 3)
>
> I'm running ceph -W cephadm with log_to_cluster_level set to debug, but
> except for the walls of text with the inventories, nothing (except
> _kick_service_loop) shows up in the logs after the INF level messages that
> host has been added or service specification has been saved.
>
> Best
>
>
> ________________________________
> From: Adam King 'adking at redhat.com'
> Sent: 18 August 2022 16:09
> To: ceph-mail@xxxxxxxxxxxxxxxx <ceph-mail@xxxxxxxxxxxxxxxx>
> Subject: Re:  Issue adding host with cephadm - nothing is
> deployed
>
> If you try shuffling some daemon around on some of the working hosts (e.g.
> changing the placement of the node-exporter spec so that one of the working
> hosts is excluded so the node-exporter there should be removed) is cephadm
> able to actually complete that? Also, does device info for any or all of
> these hosts show up in `ceph orch device ls`? I know there's been an issue
> people have run into occasionally where ceph-volume inventory (which
> cephadm uses to gather device info) was hanging on one of the hosts and it
> was causing things to get stuck until the host was removed or whatever
> caused the hang was fixed. There could also be something interesting/useful
> in the output of `ceph log last 200 cephadm`, `ceph orch host ls` and `ceph
> orch ls --format yaml`. Some traceback or even just seeing what the last
> thing logged was could be useful.
>
> On Thu, Aug 18, 2022 at 9:31 AM <ceph-mail@xxxxxxxxxxxxxxxx<mailto:
> ceph-mail@xxxxxxxxxxxxxxxx>> wrote:
> Hi again all,
>
> I have a new issue with ceph/cephadm/quincy and hopefully someone can
> assist.
>
> I have a cluster of four hosts, that I (finally) managed to bootstrap. I'm
> now trying to add several additional hosts. Whether I add the hosts from
> the dashboard or CLI, I get the same result - the host is added but no
> services are deployed.
>
> I have tried
>
>   *   confirming that the ssh connection works
>   *   enabled debug logging of cephadm and I'm watching the output
>   *   when adding the host, cephadm logs in to the host and runs check-host
>      *   the debug output show error on podman 3.0.1 but it's the same
> version as on all the working hosts
>      *   it confirms systemctl, lvcreate, chrony.service, hostname and
> concludes Host looks OK
>      *   it outputs Added host
>   *   The host is visible in the list of hosts in the dashboard and ceph
> orch host ls, but services are deployed so there is no data under model,
> CPUs etc
>   *   I have tried to edit and save the node-exporter and crash services
> (both have * for placement)
>   *   I have tried to redeploy the node-exporter service, they just get
> redeployed to the existing four hosts
>   *   I have tried ceph orch pause and then ceph orch resume
>   *   I have tried to put the host in maintenance mode and exit the
> maintenance mode
>   *   I tried to restart all the mons and mgrs - when restarting the
> active mgr, cephadm finally did the inventory of the host I added. When I
> added the additional hosts, nothing happened until I restarted the mgr
> again. The service size for crash and node-exporter increased to 8, but
> still no services are deployed and the running number remains at 4.
>
> I could see no error anywhere (except the note about podman 3.0.1) and I
> didn't have this problem when adding the first three hosts after the
> bootstrap.
>
> Ideas?
>
> Thanks
>
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx<mailto:ceph-users@xxxxxxx>
> To unsubscribe send an email to ceph-users-leave@xxxxxxx<mailto:
> ceph-users-leave@xxxxxxx>
>
>
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux