Re: Cephadm Offline Bootstrapping Issue

Adam King <adking@xxxxxxxxxx> · Fri, 2 Aug 2024 11:21:17 -0400

The thing that stands out to me from that output was that the image has no
repo_digests. It's possible cephadm is expecting there to be digests and is
crashing out trying to grab them for this image. I think it's worth a try
to set mgr/cephadm/use_repo_digest to false, and then restart the mgr. FWIW
turning off that setting has resolved other issues related to disconnected
installs as well. It just means you should avoid using floating tags.

On Thu, Aug 1, 2024 at 11:19 PM Alex Hussein-Kershaw (HE/HIM) <
alexhus@xxxxxxxxxxxxx> wrote:

> Hi,
>
> I'm hitting an issue doing an offline install of Ceph 18.2.2 using cephadm.
>
> Long output below... any advice is appreciated.
>
> Looks like we don't managed to add admin labels (but also trying with
> --skip-admin results in a similar health warning).
>
> Subsequently trying to add an OSD fails quietly, I assume because cephadm
> is unhappy.
>
> Thanks,
> Alex
>
> $  sudo  cephadm --image "ceph/ceph:v18.2.2" --docker bootstrap  --mon-ip
> `hostname -I` --skip-pull --ssh-user qs-admin --ssh-private-key
> /home/qs-admin/.ssh/id_rsa --ssh-public-key /home/qs-admin/.ssh/id_rsa.pub
> --skip-dashboard
> Verifying ssh connectivity using standard pubkey authentication ...
> Adding key to qs-admin@localhost authorized_keys...
> key already in qs-admin@localhost authorized_keys...
> Verifying podman|docker is present...
> Verifying lvm2 is present...
> Verifying time synchronization is in place...
> Unit chronyd.service is enabled and running
> Repeating the final host check...
> docker (/usr/bin/docker) is present
> systemctl is present
> lvcreate is present
> Unit chronyd.service is enabled and running
> Host looks OK
> Cluster fsid: 65bee110-3ae6-11ef-a1de-005056013d88
> Verifying IP 10.235.22.8 port 3300 ...
> Verifying IP 10.235.22.8 port 6789 ...
> Mon IP `10.235.22.8` is in CIDR network `10.235.16.0/20`
> <http://10.235.16.0/20>
> Mon IP `10.235.22.8` is in CIDR network `10.235.16.0/20`
> <http://10.235.16.0/20>
> Internal network (--cluster-network) has not been provided, OSD
> replication will default to the public_network
> Ceph version: ceph version 18.2.2
> (531c0d11a1c5d39fbfe6aa8a521f023abf3bf3e2) reef (stable)
> Extracting ceph user uid/gid from container image...
> Creating initial keys...
> Creating initial monmap...
> Creating mon...
> Waiting for mon to start...
> Waiting for mon...
> mon is available
> Assimilating anything we can from ceph.conf...
> Generating new minimal ceph.conf...
> Restarting the monitor...
> Setting public_network to 10.235.16.0/20 in mon config section
> Wrote config to /etc/ceph/ceph.conf
> Wrote keyring to /etc/ceph/ceph.client.admin.keyring
> Creating mgr...
> Verifying port 0.0.0.0:9283 ...
> Verifying port 0.0.0.0:8765 ...
> Verifying port 0.0.0.0:8443 ...
> Waiting for mgr to start...
> Waiting for mgr...
> mgr not available, waiting (1/15)...
> mgr not available, waiting (2/15)...
> mgr not available, waiting (3/15)...
> mgr not available, waiting (4/15)...
> mgr not available, waiting (5/15)...
> mgr is available
> Enabling cephadm module...
> Waiting for the mgr to restart...
> Waiting for mgr epoch 5...
> mgr epoch 5 is available
> Setting orchestrator backend to cephadm...
> Using provided ssh keys...
> Adding key to qs-admin@localhost authorized_keys...
> key already in qs-admin@localhost authorized_keys...
> Adding host starlight-1...
> Deploying mon service with default placement...
> Deploying mgr service with default placement...
> Deploying crash service with default placement...
> Deploying ceph-exporter service with default placement...
> Deploying prometheus service with default placement...
> Deploying grafana service with default placement...
> Deploying node-exporter service with default placement...
> Deploying alertmanager service with default placement...
> Enabling client.admin keyring and conf on hosts with "admin" label
> Non-zero exit code 5 from /usr/bin/docker run --rm --ipc=host
> --stop-signal=SIGTERM --ulimit nofile=1048576 --net=host --entrypoint
> /usr/bin/ceph --init -e CONTAINER_IMAGE=ceph/ceph:v18.2.2 -e
> NODE_NAME=starlight-1 -e CEPH_USE_RANDOM_NONCE=1 -v
> /var/log/ceph/65bee110-3ae6-11ef-a1de-005056013d88:/var/log/ceph:z -v
> /tmp/ceph-tmpxbngx708:/etc/ceph/ceph.client.admin.keyring:z -v
> /tmp/ceph-tmp94g7iyn2:/etc/ceph/ceph.conf:z ceph/ceph:v18.2.2 orch
> client-keyring set client.admin label:_admin
> /usr/bin/ceph: stderr Error EIO: Module 'cephadm' has experienced an error
> and cannot handle commands:
> ContainerInspectInfo(image_id='3c937764e6f5de1131b469dc69f0db09f8bd55cf6c983482cde518596d3dd0e5',
> ceph_version='ceph version 18.2.2
> (531c0d11a1c5d39fbfe6aa8a521f023abf3bf3e2) reef (stable)',
> repo_digests=[''])
> Unable to set up "admin" label; assuming older version of Ceph
> Saving cluster configuration to
> /var/lib/ceph/65bee110-3ae6-11ef-a1de-005056013d88/config directory
> Enabling autotune for osd_memory_target
> You can access the Ceph CLI as following in case of multi-cluster or
> non-default config:
>
>         sudo /usr/sbin/cephadm shell --fsid
> 65bee110-3ae6-11ef-a1de-005056013d88 -c /etc/ceph/ceph.conf -k
> /etc/ceph/ceph.client.admin.keyring
>
> Or, if you are only running a single cluster on this host:
>
>         sudo /usr/sbin/cephadm shell
>
> Please consider enabling telemetry to help improve Ceph:
>
>         ceph telemetry on
>
> For more information see:
>
>         https://docs.ceph.com/en/latest/mgr/telemetry/
>
> Bootstrap complete.
>
>
> ]$ sudo docker exec
> ceph-1b19e642-3ae5-11ef-b4e4-005056013d88-mon-starlight-1 ceph -s
>   cluster:
>     id:     1b19e642-3ae5-11ef-b4e4-005056013d88
>     health: HEALTH_ERR
>             Module 'cephadm' has failed:
> ContainerInspectInfo(image_id='3c937764e6f5de1131b469dc69f0db09f8bd55cf6c983482cde518596d3dd0e5',
> ceph_version='ceph version 18.2.2
> (531c0d11a1c5d39fbfe6aa8a521f023abf3bf3e2) reef (stable)',
> repo_digests=[''])
>             OSD count 0 < osd_pool_default_size 3
>
>   services:
>     mon: 1 daemons, quorum starlight-1 (age 2m)
>     mgr: starlight-1.yhqrry(active, since 107s)
>     osd: 0 osds: 0 up, 0 in
>
>   data:
>     pools:   0 pools, 0 pgs
>     objects: 0 objects, 0 B
>     usage:   0 B used, 0 B / 0 B avail
>     pgs:
>
>
>
>
>
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx