Re: error deploying ceph

Adam King <adking@xxxxxxxxxx> · Thu, 30 Nov 2023 08:31:34 -0500

That message in the `ceph orch device ls` output is just why the device is
unavailable for an OSD. The reason it now has sufficient space in this case
is because you've already put an OSD on it, so it's really just telling you
you can't place another one. So you can expect to see something like that
for each device you place an OSD on and it's nothing to worry about. It's
useful information if, for example, you remove the OSD associated with the
device but forget to zap the device after, and are wondering why you can't
put another OSD on it later.

On Thu, Nov 30, 2023 at 8:00 AM Francisco Arencibia Quesada <
arencibia.francisco@xxxxxxxxx> wrote:

> Thanks again guys,
>
> The cluster is healthy now, is this normal?  all looks look except for
> this output
> *Has a FileSystem, Insufficient space (<10 extents) on vgs, LVM detected  *
>
> root@node1-ceph:~# cephadm shell -- ceph status
> Inferring fsid 209a7bf0-8f6d-11ee-8828-23977d76b74f
> Inferring config
> /var/lib/ceph/209a7bf0-8f6d-11ee-8828-23977d76b74f/mon.node1-ceph/config
> Using ceph image with id '921993c4dfd2' and tag 'v17' created on
> 2023-11-22 16:03:22 +0000 UTC
>
> quay.io/ceph/ceph@sha256:dad2876c2916b732d060b71320f97111bc961108f9c249f4daa9540957a2b6a2
>   cluster:
>     id:     209a7bf0-8f6d-11ee-8828-23977d76b74f
>     health: HEALTH_OK
>
>   services:
>     mon: 3 daemons, quorum node1-ceph,node2-ceph,node3-ceph (age 2h)
>     mgr: node1-ceph.peedpx(active, since 2h), standbys: node2-ceph.ykkvho
>     osd: 3 osds: 3 up (since 2h), 3 in (since 2h)
>
>   data:
>     pools:   2 pools, 33 pgs
>     objects: 7 objects, 449 KiB
>     usage:   873 MiB used, 299 GiB / 300 GiB avail
>     pgs:     33 active+clean
>
> root@node1-ceph:~# cephadm shell -- ceph orch device ls --wide
> Inferring fsid 209a7bf0-8f6d-11ee-8828-23977d76b74f
> Inferring config
> /var/lib/ceph/209a7bf0-8f6d-11ee-8828-23977d76b74f/mon.node1-ceph/config
> Using ceph image with id '921993c4dfd2' and tag 'v17' created on
> 2023-11-22 16:03:22 +0000 UTC
>
> quay.io/ceph/ceph@sha256:dad2876c2916b732d060b71320f97111bc961108f9c249f4daa9540957a2b6a2
> HOST        PATH       TYPE  TRANSPORT  RPM  DEVICE ID   SIZE  HEALTH
>  IDENT  FAULT  AVAILABLE  REFRESHED  REJECT REASONS
>
> node1-ceph  /dev/xvdb  ssd                               100G          N/A
>    N/A    No         27m ago    Has a FileSystem, Insufficient space (<10
> extents) on vgs, LVM detected
> node2-ceph  /dev/xvdb  ssd                               100G          N/A
>    N/A    No         27m ago    Has a FileSystem, Insufficient space (<10
> extents) on vgs, LVM detected
> node3-ceph  /dev/xvdb  ssd                               100G          N/A
>    N/A    No         27m ago    Has a FileSystem, Insufficient space (<10
> extents) on vgs, LVM detected
> root@node1-ceph:~#
>
> On Wed, Nov 29, 2023 at 10:38 PM Adam King <adking@xxxxxxxxxx> wrote:
>
>> To run a `ceph orch...` (or really any command to the cluster) you should
>> first open a shell with `cephadm shell`. That will put you in a bash shell
>> inside a container that has the ceph packages matching the ceph version in
>> your cluster. If you just want a single command rather than an interactive
>> shell, you can also do `cephadm shell -- ceph orch...`. Also, this might
>> not turn out to be an issue, but just thinking ahead, the devices cephadm
>> will typically allow you to put an OSD on should match what's output by
>> `ceph orch device ls` (which is populated by `cephadm ceph-volume --
>> inventory --format=json-pretty` if you want to look further). So I'd
>> generally say to always check that before making any OSDs through the
>> orchestrator. I also generally like to recommend setting up OSDs through
>> drive group specs (
>> https://docs.ceph.com/en/latest/cephadm/services/osd/#advanced-osd-service-specifications)
>> over using `ceph orch daemon add osd...` although that's a tangent to what
>> you're trying to do now.
>>
>> On Wed, Nov 29, 2023 at 4:14 PM Francisco Arencibia Quesada <
>> arencibia.francisco@xxxxxxxxx> wrote:
>>
>>> Thanks so much Adam, that worked great, however I can not add any
>>> storage with:
>>>
>>> sudo cephadm ceph orch daemon add osd node2-ceph:/dev/nvme1n1
>>>
>>> root@node1-ceph:~# ceph status
>>>   cluster:
>>>     id:     9d8f1112-8ef9-11ee-838e-a74e679f7866
>>>     health: HEALTH_WARN
>>>             Failed to apply 1 service(s): osd.all-available-devices
>>>             2 failed cephadm daemon(s)
>>>             OSD count 0 < osd_pool_default_size 3
>>>
>>>   services:
>>>     mon: 1 daemons, quorum node1-ceph (age 18m)
>>>     mgr: node1-ceph.jitjfd(active, since 17m)
>>>     osd: 0 osds: 0 up, 0 in (since 6m)
>>>
>>>   data:
>>>     pools:   0 pools, 0 pgs
>>>     objects: 0 objects, 0 B
>>>     usage:   0 B used, 0 B / 0 B avail
>>>     pgs:
>>>
>>> root@node1-ceph:~#
>>>
>>> Regards
>>>
>>>
>>>
>>> On Wed, Nov 29, 2023 at 5:45 PM Adam King <adking@xxxxxxxxxx> wrote:
>>>
>>>> I think I remember a bug that happened when there was a small mismatch
>>>> between the cephadm version being used for bootstrapping and the container.
>>>> In this case, the cephadm binary used for bootstrap knows about the
>>>> ceph-exporter service and the container image being used does not. The
>>>> ceph-exporter was removed from quincy between 17.2.6 and 17.2.7 so I'd
>>>> guess the cephadm binary here is a bit older and it's pulling hte 17.2.7
>>>> image. For now, I'd say just workaround this by running bootstrap with
>>>> `--skip-monitoring-stack` flag. If you want the other services in the
>>>> monitoring stack after bootstrap you can just run `ceph orch apply
>>>> <service>` for services alertmanager, prometheus, node-exporter, and
>>>> grafana and it would get you in the same spot as if you didn't provide the
>>>> flag and weren't hitting the issue.
>>>>
>>>> For an extra note, this failed bootstrap might be leaving things around
>>>> that could cause subsequent bootstraps to fail. If you run `cephadm ls` and
>>>> see things listed, you can grab the fsid from the output of that command
>>>> and run `cephadm rm-cluster --force --fsid <fsid>` to clean up the env
>>>> before bootstrapping again.
>>>>
>>>> On Wed, Nov 29, 2023 at 11:32 AM Francisco Arencibia Quesada <
>>>> arencibia.francisco@xxxxxxxxx> wrote:
>>>>
>>>>> Hello guys,
>>>>>
>>>>> This situation is driving me crazy, I have tried to deploy a ceph
>>>>> cluster,
>>>>> in all ways possible, even with ansible and at some point it breaks.
>>>>> I'm
>>>>> using Ubuntu 22.0.4.  This is one of the errors I'm having, some
>>>>> problem
>>>>> with ceph-exporter.  Please could you help me, I have been dealing with
>>>>> this for like 5 days.
>>>>> Kind regards
>>>>>
>>>>>  root@node1-ceph:~# cephadm bootstrap --mon-ip 10.0.0.52
>>>>> Verifying podman|docker is present...
>>>>> Verifying lvm2 is present...
>>>>> Verifying time synchronization is in place...
>>>>> Unit systemd-timesyncd.service is enabled and running
>>>>> Repeating the final host check...
>>>>> docker (/usr/bin/docker) is present
>>>>> systemctl is present
>>>>> lvcreate is present
>>>>> Unit systemd-timesyncd.service is enabled and running
>>>>> Host looks OK
>>>>> Cluster fsid: 4ce3a92a-8ddd-11ee-9b23-6341187f70c1
>>>>> Verifying IP 10.0.0.52 port 3300 ...
>>>>> Verifying IP 10.0.0.52 port 6789 ...
>>>>> Mon IP `10.0.0.52` is in CIDR network `10.0.0.0/24`
>>>>> <http://10.0.0.0/24>
>>>>> Mon IP `10.0.0.52` is in CIDR network `10.0.0.0/24`
>>>>> <http://10.0.0.0/24>
>>>>> Mon IP `10.0.0.52` is in CIDR network `10.0.0.1/32`
>>>>> <http://10.0.0.1/32>
>>>>> Mon IP `10.0.0.52` is in CIDR network `10.0.0.1/32`
>>>>> <http://10.0.0.1/32>
>>>>> Internal network (--cluster-network) has not been provided, OSD
>>>>> replication
>>>>> will default to the public_network
>>>>> Pulling container image quay.io/ceph/ceph:v17...
>>>>> Ceph version: ceph version 17.2.7
>>>>> (b12291d110049b2f35e32e0de30d70e9a4c060d2) quincy (stable)
>>>>> Extracting ceph user uid/gid from container image...
>>>>> Creating initial keys...
>>>>> Creating initial monmap...
>>>>> Creating mon...
>>>>> Waiting for mon to start...
>>>>> Waiting for mon...
>>>>> mon is available
>>>>> Assimilating anything we can from ceph.conf...
>>>>> Generating new minimal ceph.conf...
>>>>> Restarting the monitor...
>>>>> Setting mon public_network to 10.0.0.1/32,10.0.0.0/24
>>>>> Wrote config to /etc/ceph/ceph.conf
>>>>> Wrote keyring to /etc/ceph/ceph.client.admin.keyring
>>>>> Creating mgr...
>>>>> Verifying port 9283 ...
>>>>> Waiting for mgr to start...
>>>>> Waiting for mgr...
>>>>> mgr not available, waiting (1/15)...
>>>>> mgr not available, waiting (2/15)...
>>>>> mgr not available, waiting (3/15)...
>>>>> mgr not available, waiting (4/15)...
>>>>> mgr not available, waiting (5/15)...
>>>>> mgr is available
>>>>> Enabling cephadm module...
>>>>> Waiting for the mgr to restart...
>>>>> Waiting for mgr epoch 5...
>>>>> mgr epoch 5 is available
>>>>> Setting orchestrator backend to cephadm...
>>>>> Generating ssh key...
>>>>> Wrote public SSH key to /etc/ceph/ceph.pub
>>>>> Adding key to root@localhost authorized_keys...
>>>>> Adding host node1-ceph...
>>>>> Deploying mon service with default placement...
>>>>> Deploying mgr service with default placement...
>>>>> Deploying crash service with default placement...
>>>>> Deploying ceph-exporter service with default placement...
>>>>> Non-zero exit code 22 from /usr/bin/docker run --rm --ipc=host
>>>>> --stop-signal=SIGTERM --net=host --entrypoint /usr/bin/ceph --init -e
>>>>> CONTAINER_IMAGE=quay.io/ceph/ceph:v17 -e NODE_NAME=node1-ceph -e
>>>>> CEPH_USE_RANDOM_NONCE=1 -v
>>>>> /var/log/ceph/4ce3a92a-8ddd-11ee-9b23-6341187f70c1:/var/log/ceph:z -v
>>>>> /tmp/ceph-tmp6yz3vt5s:/etc/ceph/ceph.client.admin.keyring:z -v
>>>>> /tmp/ceph-tmpfhd01qwu:/etc/ceph/ceph.conf:z quay.io/ceph/ceph:v17 orch
>>>>> apply ceph-exporter
>>>>> /usr/bin/ceph: stderr Error EINVAL: Usage:
>>>>> /usr/bin/ceph: stderr   ceph orch apply -i <yaml spec> [--dry-run]
>>>>> /usr/bin/ceph: stderr   ceph orch apply <service_type>
>>>>> [--placement=<placement_string>] [--unmanaged]
>>>>> /usr/bin/ceph: stderr
>>>>> Traceback (most recent call last):
>>>>>   File "/usr/sbin/cephadm", line 9653, in <module>
>>>>>     main()
>>>>>   File "/usr/sbin/cephadm", line 9641, in main
>>>>>     r = ctx.func(ctx)
>>>>>   File "/usr/sbin/cephadm", line 2205, in _default_image
>>>>>     return func(ctx)
>>>>>   File "/usr/sbin/cephadm", line 5774, in command_bootstrap
>>>>>     prepare_ssh(ctx, cli, wait_for_mgr_restart)
>>>>>   File "/usr/sbin/cephadm", line 5275, in prepare_ssh
>>>>>     cli(['orch', 'apply', t])
>>>>>   File "/usr/sbin/cephadm", line 5708, in cli
>>>>>     return CephContainer(
>>>>>   File "/usr/sbin/cephadm", line 4144, in run
>>>>>     out, _, _ = call_throws(self.ctx, self.run_cmd(),
>>>>>   File "/usr/sbin/cephadm", line 1853, in call_throws
>>>>>     raise RuntimeError('Failed command: %s' % ' '.join(command))
>>>>> RuntimeError: Failed command: /usr/bin/docker run --rm --ipc=host
>>>>> --stop-signal=SIGTERM --net=host --entrypoint /usr/bin/ceph --init -e
>>>>> CONTAINER_IMAGE=quay.io/ceph/ceph:v17 -e NODE_NAME=node1-ceph -e
>>>>> CEPH_USE_RANDOM_NONCE=1 -v
>>>>> /var/log/ceph/4ce3a92a-8ddd-11ee-9b23-6341187f70c1:/var/log/ceph:z -v
>>>>> /tmp/ceph-tmp6yz3vt5s:/etc/ceph/ceph.client.admin.keyring:z -v
>>>>> /tmp/ceph-tmpfhd01qwu:/etc/ceph/ceph.conf:z quay.io/ceph/ceph:v17 orch
>>>>> apply ceph-exporter
>>>>>
>>>>> --
>>>>> *Francisco Arencibia Quesada.*
>>>>> *DevOps Engineer*
>>>>> _______________________________________________
>>>>> ceph-users mailing list -- ceph-users@xxxxxxx
>>>>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>>>>>
>>>>>
>>>
>>> --
>>> *Francisco Arencibia Quesada.*
>>> *DevOps Engineer*
>>>
>>
>
> --
> *Francisco Arencibia Quesada.*
> *DevOps Engineer*
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx