cephadm / ceph orch : indefinite hang adding hosts to new cluster

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Greetings list,

We have a new Ceph cluster we are trying to deploy on EL8 (CentOS Stream) using cephadm (+podman), targeting Pacific.

We are successfully able to bootstrap the first host, but attempting to add any additional hosts hangs indefinitely. We have confirmed that we are able to SSH from the first host to subsequent hosts using the key generated by Ceph.

This is the only logging output we see:
# cephadm shell -- ceph orch host add kvm-mon02 192.168.7.12
Inferring fsid 826b9b36-4729-11ec-99f0-c81f66d05a38
Using recent ceph image quay.io/ceph/ceph@sha256:a2c23b6942f7fbc1e15d8cfacd6655a681fe0e44f288e4a158db22030b8d58e3

This command hangs indefinitely until killed via podman kill​.

Inspecting the host we're trying to add, we see that Ceph has launched a python process:
root        3604  0.0  0.0 164128  6316 ?        S    16:36   0:00  |   \_ sshd: root@notty
root        3605  0.0  0.0  31976  8752 ?        Ss   16:36   0:00  |       \_ python3 -c import sys;exec(eval(sys.stdin.readline()))

Inside of the mgr container, we see 2 SSH connections:
ceph         186  0.0  0.0  44076  6676 ?        S    22:31   0:00  \_ ssh -C -F /tmp/cephadm-conf-s0b8c90d -i /tmp/cephadm-identity-8ku7ib6b root@192.168.7.13 python3 -c "import sys;exec(eval(sys.stdin.readline()))"
ceph         211  0.0  0.0  44076  6716 ?        S    22:36   0:00  \_ ssh -C -F /tmp/cephadm-conf-s0b8c90d -i /tmp/cephadm-identity-8ku7ib6b root@192.168.7.12 python3 -c "import sys;exec(eval(sys.stdin.readline()))"

where 192.168.1.13 is the IP of the first host in the cluster (which has succesfully bootstrapped and is running mgr, mon, and so on), and 196.168.1.12 is the host we are trying to unsuccessfully add.

The mgr logs show no particularly interesting except for:
debug 2021-11-16T22:39:03.570+0000 7fb6e4914700  0 [progress WARNING root] complete: ev de058df7-b54a-4429-933a-99abe7796715 does not exist
debug 2021-11-16T22:39:03.571+0000 7fb6e4914700  0 [progress WARNING root] complete: ev 61fe4998-4ef4-4640-8a13-2b0928da737f does not exist
debug 2021-11-16T22:39:03.571+0000 7fb6e4914700  0 [progress WARNING root] complete: ev 2323b586-5262-497e-b318-42702c0dc3dc does not exist
debug 2021-11-16T22:39:03.572+0000 7fb6e4914700  0 [progress WARNING root] complete: ev f882757b-e03a-4798-8fae-4d55e2d1f531 does not exist
debug 2021-11-16T22:39:03.572+0000 7fb6e4914700  0 [progress WARNING root] complete: ev 3d35e91f-b475-4dbc-a040-65658e82fe67 does not exist
debug 2021-11-16T22:39:03.572+0000 7fb6e4914700  0 [progress WARNING root] complete: ev 694ed4e6-b741-4bf3-9f7f-b11c549aca87 does not exist
debug 2021-11-16T22:39:03.573+0000 7fb6e4914700  0 [progress WARNING root] complete: ev 1ce718f4-a23b-4a27-8f7c-bc2610340403 does not exist

We've tried purging/reinstalling several times to no avail, and also tried swapping which host was used as the initial bootstrap mon and so on. I've also tried the option to disable the monitoring stack and that did not help either.

In any case, we are not sure how to proceed from here. Is there anything we can do to turn up logging verbosity, or other things to check? I've tried to find the ceph orch​ source code to try to understand what may be happening, but I'm not sure where to look.

Thanks,
Lincoln Bryant
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux