Re: Cephadm Offline Bootstrapping Issue

Tim Holloway <timh@xxxxxxxxxxxxx> · Fri, 02 Aug 2024 10:59:21 -0400

You might want to to try my "bringing up an OSD really, really fast"
package (https://gogs.mousetech.com/mtsinc7/instant_osd).

It's actually for spinning up a VM with an OSD in it, although you can
skip the VM setup script if you're on a bare OS and just run the
Ansible part.

Apologies for anyone who tried to pull it last week, as lightning
destroyed the cable to my ISP and it took them 5 days to get me back on
the Internet. So much for having a business account.

One quirk. You may need to manually copy in a copy of you ceph osd-
bootstrap key to get the operation to complete. I'm not sure why, since
I'd expect cephadm to have dealt with that, and the key has to be
located in /etc/cepn, hot in /etc/ceph/<filesystemdid>. And may further
be impacted by the filesystem differences internal and external to the
cephadm shell. Which is mildly annoying, but not too bad. Someday I
hope to get that part locked down.

   Tim

On Fri, 2024-08-02 at 13:24 +0000, Eugen Block wrote:
> Hi,
> 
> I haven't seen that one yet. Can you show the output from these
> commands?
> 
> ceph orch client-keyring ls
> ceph orch client-keyring set client.admin label:_admin
> 
> Is there anything helpful in the mgr log?
> 
> Zitat von "Alex Hussein-Kershaw (HE/HIM)" <alexhus@xxxxxxxxxxxxx>:
> 
> > Hi,
> > 
> > I'm hitting an issue doing an offline install of Ceph 18.2.2 using
> > cephadm.
> > 
> > Long output below... any advice is appreciated.
> > 
> > Looks like we don't managed to add admin labels (but also trying  
> > with --skip-admin results in a similar health warning).
> > 
> > Subsequently trying to add an OSD fails quietly, I assume because  
> > cephadm is unhappy.
> > 
> > Thanks,
> > Alex
> > 
> > $  sudo  cephadm --image "ceph/ceph:v18.2.2" --docker bootstrap   
> > --mon-ip `hostname -I` --skip-pull --ssh-user qs-admin  
> > --ssh-private-key /home/qs-admin/.ssh/id_rsa --ssh-public-key  
> > /home/qs-admin/.ssh/id_rsa.pub  --skip-dashboard
> > Verifying ssh connectivity using standard pubkey authentication ...
> > Adding key to qs-admin@localhost authorized_keys...
> > key already in qs-admin@localhost authorized_keys...
> > Verifying podman|docker is present...
> > Verifying lvm2 is present...
> > Verifying time synchronization is in place...
> > Unit chronyd.service is enabled and running
> > Repeating the final host check...
> > docker (/usr/bin/docker) is present
> > systemctl is present
> > lvcreate is present
> > Unit chronyd.service is enabled and running
> > Host looks OK
> > Cluster fsid: 65bee110-3ae6-11ef-a1de-005056013d88
> > Verifying IP 10.235.22.8 port 3300 ...
> > Verifying IP 10.235.22.8 port 6789 ...
> > Mon IP `10.235.22.8` is in CIDR network `10.235.16.0/20`
> > Mon IP `10.235.22.8` is in CIDR network `10.235.16.0/20`
> > Internal network (--cluster-network) has not been provided, OSD  
> > replication will default to the public_network
> > Ceph version: ceph version 18.2.2  
> > (531c0d11a1c5d39fbfe6aa8a521f023abf3bf3e2) reef (stable)
> > Extracting ceph user uid/gid from container image...
> > Creating initial keys...
> > Creating initial monmap...
> > Creating mon...
> > Waiting for mon to start...
> > Waiting for mon...
> > mon is available
> > Assimilating anything we can from ceph.conf...
> > Generating new minimal ceph.conf...
> > Restarting the monitor...
> > Setting public_network to 10.235.16.0/20 in mon config section
> > Wrote config to /etc/ceph/ceph.conf
> > Wrote keyring to /etc/ceph/ceph.client.admin.keyring
> > Creating mgr...
> > Verifying port 0.0.0.0:9283 ...
> > Verifying port 0.0.0.0:8765 ...
> > Verifying port 0.0.0.0:8443 ...
> > Waiting for mgr to start...
> > Waiting for mgr...
> > mgr not available, waiting (1/15)...
> > mgr not available, waiting (2/15)...
> > mgr not available, waiting (3/15)...
> > mgr not available, waiting (4/15)...
> > mgr not available, waiting (5/15)...
> > mgr is available
> > Enabling cephadm module...
> > Waiting for the mgr to restart...
> > Waiting for mgr epoch 5...
> > mgr epoch 5 is available
> > Setting orchestrator backend to cephadm...
> > Using provided ssh keys...
> > Adding key to qs-admin@localhost authorized_keys...
> > key already in qs-admin@localhost authorized_keys...
> > Adding host starlight-1...
> > Deploying mon service with default placement...
> > Deploying mgr service with default placement...
> > Deploying crash service with default placement...
> > Deploying ceph-exporter service with default placement...
> > Deploying prometheus service with default placement...
> > Deploying grafana service with default placement...
> > Deploying node-exporter service with default placement...
> > Deploying alertmanager service with default placement...
> > Enabling client.admin keyring and conf on hosts with "admin" label
> > Non-zero exit code 5 from /usr/bin/docker run --rm --ipc=host  
> > --stop-signal=SIGTERM --ulimit nofile=1048576 --net=host  
> > --entrypoint /usr/bin/ceph --init -e  
> > CONTAINER_IMAGE=ceph/ceph:v18.2.2 -e NODE_NAME=starlight-1 -e  
> > CEPH_USE_RANDOM_NONCE=1 -v  
> > /var/log/ceph/65bee110-3ae6-11ef-a1de-005056013d88:/var/log/ceph:z 
> > -v /tmp/ceph-tmpxbngx708:/etc/ceph/ceph.client.admin.keyring:z -v  
> > /tmp/ceph-tmp94g7iyn2:/etc/ceph/ceph.conf:z ceph/ceph:v18.2.2 orch 
> > client-keyring set client.admin label:_admin
> > /usr/bin/ceph: stderr Error EIO: Module 'cephadm' has experienced
> > an  
> > error and cannot handle commands:  
> > ContainerInspectInfo(image_id='3c937764e6f5de1131b469dc69f0db09f8bd
> > 55cf6c983482cde518596d3dd0e5', ceph_version='ceph version 18.2.2
> > (531c0d11a1c5d39fbfe6aa8a521f023abf3bf3e2) reef (stable)',  
> > repo_digests=[''])
> > Unable to set up "admin" label; assuming older version of Ceph
> > Saving cluster configuration to  
> > /var/lib/ceph/65bee110-3ae6-11ef-a1de-005056013d88/config directory
> > Enabling autotune for osd_memory_target
> > You can access the Ceph CLI as following in case of multi-cluster
> > or  
> > non-default config:
> > 
> >         sudo /usr/sbin/cephadm shell --fsid  
> > 65bee110-3ae6-11ef-a1de-005056013d88 -c /etc/ceph/ceph.conf -k  
> > /etc/ceph/ceph.client.admin.keyring
> > 
> > Or, if you are only running a single cluster on this host:
> > 
> >         sudo /usr/sbin/cephadm shell
> > 
> > Please consider enabling telemetry to help improve Ceph:
> > 
> >         ceph telemetry on
> > 
> > For more information see:
> > 
> >         https://docs.ceph.com/en/latest/mgr/telemetry/
> > 
> > Bootstrap complete.
> > 
> > 
> > ]$ sudo docker exec  
> > ceph-1b19e642-3ae5-11ef-b4e4-005056013d88-mon-starlight-1 ceph -s
> >   cluster:
> >     id:     1b19e642-3ae5-11ef-b4e4-005056013d88
> >     health: HEALTH_ERR
> >             Module 'cephadm' has failed:  
> > ContainerInspectInfo(image_id='3c937764e6f5de1131b469dc69f0db09f8bd
> > 55cf6c983482cde518596d3dd0e5', ceph_version='ceph version 18.2.2
> > (531c0d11a1c5d39fbfe6aa8a521f023abf3bf3e2) reef (stable)',  
> > repo_digests=[''])
> >             OSD count 0 < osd_pool_default_size 3
> > 
> >   services:
> >     mon: 1 daemons, quorum starlight-1 (age 2m)
> >     mgr: starlight-1.yhqrry(active, since 107s)
> >     osd: 0 osds: 0 up, 0 in
> > 
> >   data:
> >     pools:   0 pools, 0 pgs
> >     objects: 0 objects, 0 B
> >     usage:   0 B used, 0 B / 0 B avail
> >     pgs:
> > 
> > 
> > 
> > 
> > 
> > _______________________________________________
> > ceph-users mailing list -- ceph-users@xxxxxxx
> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
> 
> 
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx