Re: cephadm / ceph orch : indefinite hang adding hosts to new cluster

Lincoln Bryant <lincolnb@xxxxxxxxxxxx> · Wed, 17 Nov 2021 15:18:06 +0000

Hi,

Yes, the hosts have internet access and other Ceph commands work successfully. Every host we have tried has worked for bootstrap, but adding another node to the cluster isn't working. We've also tried adding intentionally bad hosts and get expected failures (missing SSH key, etc).

Here's some check-host output for our mons:

[root@kvm-mon03 ~]# cephadm check-host
podman|docker (/usr/bin/podman) is present
systemctl is present
lvcreate is present
Unit chronyd.service is enabled and running
Host looks OK
[root@kvm-mon02 ~]#  cephadm check-host
podman|docker (/usr/bin/podman) is present
systemctl is present
lvcreate is present
Unit chronyd.service is enabled and running
Host looks OK
[root@kvm-mon01 ~]# cephadm check-host
podman (/usr/bin/podman) version 3.4.1 is present
systemctl is present
lvcreate is present
Unit chronyd.service is enabled and running
Host looks OK

Tailing the logs, journalctl simply reports:

 sshd[8135]: Accepted publickey for root from 192.168.7.13 port 41378 ssh2: RSA SHA256:JZxTh1Su9A+cqx14cIxzbP2W0vRHwgNcGQioLPCMFtk
 systemd-logind[2155]: New session 17 of user root.
 systemd[1]: Started Session 17 of user root.
 sshd[8135]: pam_unix(sshd:session): session opened for user root by (uid=0)

Very strange...

Maybe a manual installation will reveal issues?

--Lincoln
________________________________
From: Eugen Block <eblock@xxxxxx>
Sent: Wednesday, November 17, 2021 2:27 AM
To: ceph-users@xxxxxxx <ceph-users@xxxxxxx>
Subject:  Re: cephadm / ceph orch : indefinite hang adding hosts to new cluster

Hi,

> This is the only logging output we see:
> # cephadm shell -- ceph orch host add kvm-mon02 192.168.7.12
> Inferring fsid 826b9b36-4729-11ec-99f0-c81f66d05a38
> Using recent ceph image
> quay.io/ceph/ceph@sha256:a2c23b6942f7fbc1e15d8cfacd6655a681fe0e44f288e4a158db22030b8d58e3
>
> This command hangs indefinitely until killed via podman kill.

the first thought coming to mind is, do the hosts have internet access
to download the container images? But if the bootstrap worked the
answer would be yes. And IIUC you tried different hosts for bootstrap
and all of them worked? Just for the record, a manual 'podman pull
...' on the second host works, too?

What does a 'cephadm check-host' on the second host report?
The syslog on the second node should usually reveal errors, have you
checked 'journalctl -f' during the attempt to add it?

Zitat von Lincoln Bryant <lincolnb@xxxxxxxxxxxx>:

> Greetings list,
>
> We have a new Ceph cluster we are trying to deploy on EL8 (CentOS
> Stream) using cephadm (+podman), targeting Pacific.
>
> We are successfully able to bootstrap the first host, but attempting
> to add any additional hosts hangs indefinitely. We have confirmed
> that we are able to SSH from the first host to subsequent hosts
> using the key generated by Ceph.
>
> This is the only logging output we see:
> # cephadm shell -- ceph orch host add kvm-mon02 192.168.7.12
> Inferring fsid 826b9b36-4729-11ec-99f0-c81f66d05a38
> Using recent ceph image
> quay.io/ceph/ceph@sha256:a2c23b6942f7fbc1e15d8cfacd6655a681fe0e44f288e4a158db22030b8d58e3
>
> This command hangs indefinitely until killed via podman kill.
>
> Inspecting the host we're trying to add, we see that Ceph has
> launched a python process:
> root        3604  0.0  0.0 164128  6316 ?        S    16:36   0:00
> |   \_ sshd: root@notty
> root        3605  0.0  0.0  31976  8752 ?        Ss   16:36   0:00
> |       \_ python3 -c import sys;exec(eval(sys.stdin.readline()))
>
> Inside of the mgr container, we see 2 SSH connections:
> ceph         186  0.0  0.0  44076  6676 ?        S    22:31   0:00
> \_ ssh -C -F /tmp/cephadm-conf-s0b8c90d -i
> /tmp/cephadm-identity-8ku7ib6b root@192.168.7.13 python3 -c "import
> sys;exec(eval(sys.stdin.readline()))"
> ceph         211  0.0  0.0  44076  6716 ?        S    22:36   0:00
> \_ ssh -C -F /tmp/cephadm-conf-s0b8c90d -i
> /tmp/cephadm-identity-8ku7ib6b root@192.168.7.12 python3 -c "import
> sys;exec(eval(sys.stdin.readline()))"
>
> where 192.168.1.13 is the IP of the first host in the cluster (which
> has succesfully bootstrapped and is running mgr, mon, and so on),
> and 196.168.1.12 is the host we are trying to unsuccessfully add.
>
> The mgr logs show no particularly interesting except for:
> debug 2021-11-16T22:39:03.570+0000 7fb6e4914700  0 [progress WARNING
> root] complete: ev de058df7-b54a-4429-933a-99abe7796715 does not exist
> debug 2021-11-16T22:39:03.571+0000 7fb6e4914700  0 [progress WARNING
> root] complete: ev 61fe4998-4ef4-4640-8a13-2b0928da737f does not exist
> debug 2021-11-16T22:39:03.571+0000 7fb6e4914700  0 [progress WARNING
> root] complete: ev 2323b586-5262-497e-b318-42702c0dc3dc does not exist
> debug 2021-11-16T22:39:03.572+0000 7fb6e4914700  0 [progress WARNING
> root] complete: ev f882757b-e03a-4798-8fae-4d55e2d1f531 does not exist
> debug 2021-11-16T22:39:03.572+0000 7fb6e4914700  0 [progress WARNING
> root] complete: ev 3d35e91f-b475-4dbc-a040-65658e82fe67 does not exist
> debug 2021-11-16T22:39:03.572+0000 7fb6e4914700  0 [progress WARNING
> root] complete: ev 694ed4e6-b741-4bf3-9f7f-b11c549aca87 does not exist
> debug 2021-11-16T22:39:03.573+0000 7fb6e4914700  0 [progress WARNING
> root] complete: ev 1ce718f4-a23b-4a27-8f7c-bc2610340403 does not exist
>
> We've tried purging/reinstalling several times to no avail, and also
> tried swapping which host was used as the initial bootstrap mon and
> so on. I've also tried the option to disable the monitoring stack
> and that did not help either.
>
> In any case, we are not sure how to proceed from here. Is there
> anything we can do to turn up logging verbosity, or other things to
> check? I've tried to find the ceph orch source code to try to
> understand what may be happening, but I'm not sure where to look.
>
> Thanks,
> Lincoln Bryant
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx