Re: error deploying ceph

Francisco Arencibia Quesada <arencibia.francisco@xxxxxxxxx> · Thu, 30 Nov 2023 14:00:37 +0100

Thanks again guys,

The cluster is healthy now, is this normal?  all looks look except for this
output
*Has a FileSystem, Insufficient space (<10 extents) on vgs, LVM detected  *

root@node1-ceph:~# cephadm shell -- ceph status
Inferring fsid 209a7bf0-8f6d-11ee-8828-23977d76b74f
Inferring config
/var/lib/ceph/209a7bf0-8f6d-11ee-8828-23977d76b74f/mon.node1-ceph/config
Using ceph image with id '921993c4dfd2' and tag 'v17' created on 2023-11-22
16:03:22 +0000 UTC
quay.io/ceph/ceph@sha256:dad2876c2916b732d060b71320f97111bc961108f9c249f4daa9540957a2b6a2
  cluster:
    id:     209a7bf0-8f6d-11ee-8828-23977d76b74f
    health: HEALTH_OK

  services:
    mon: 3 daemons, quorum node1-ceph,node2-ceph,node3-ceph (age 2h)
    mgr: node1-ceph.peedpx(active, since 2h), standbys: node2-ceph.ykkvho
    osd: 3 osds: 3 up (since 2h), 3 in (since 2h)

  data:
    pools:   2 pools, 33 pgs
    objects: 7 objects, 449 KiB
    usage:   873 MiB used, 299 GiB / 300 GiB avail
    pgs:     33 active+clean

root@node1-ceph:~# cephadm shell -- ceph orch device ls --wide
Inferring fsid 209a7bf0-8f6d-11ee-8828-23977d76b74f
Inferring config
/var/lib/ceph/209a7bf0-8f6d-11ee-8828-23977d76b74f/mon.node1-ceph/config
Using ceph image with id '921993c4dfd2' and tag 'v17' created on 2023-11-22
16:03:22 +0000 UTC
quay.io/ceph/ceph@sha256:dad2876c2916b732d060b71320f97111bc961108f9c249f4daa9540957a2b6a2
HOST        PATH       TYPE  TRANSPORT  RPM  DEVICE ID   SIZE  HEALTH
 IDENT  FAULT  AVAILABLE  REFRESHED  REJECT REASONS

node1-ceph  /dev/xvdb  ssd                               100G          N/A
   N/A    No         27m ago    Has a FileSystem, Insufficient space (<10
extents) on vgs, LVM detected
node2-ceph  /dev/xvdb  ssd                               100G          N/A
   N/A    No         27m ago    Has a FileSystem, Insufficient space (<10
extents) on vgs, LVM detected
node3-ceph  /dev/xvdb  ssd                               100G          N/A
   N/A    No         27m ago    Has a FileSystem, Insufficient space (<10
extents) on vgs, LVM detected
root@node1-ceph:~#

On Wed, Nov 29, 2023 at 10:38 PM Adam King <adking@xxxxxxxxxx> wrote:

> To run a `ceph orch...` (or really any command to the cluster) you should
> first open a shell with `cephadm shell`. That will put you in a bash shell
> inside a container that has the ceph packages matching the ceph version in
> your cluster. If you just want a single command rather than an interactive
> shell, you can also do `cephadm shell -- ceph orch...`. Also, this might
> not turn out to be an issue, but just thinking ahead, the devices cephadm
> will typically allow you to put an OSD on should match what's output by
> `ceph orch device ls` (which is populated by `cephadm ceph-volume --
> inventory --format=json-pretty` if you want to look further). So I'd
> generally say to always check that before making any OSDs through the
> orchestrator. I also generally like to recommend setting up OSDs through
> drive group specs (
> https://docs.ceph.com/en/latest/cephadm/services/osd/#advanced-osd-service-specifications)
> over using `ceph orch daemon add osd...` although that's a tangent to what
> you're trying to do now.
>
> On Wed, Nov 29, 2023 at 4:14 PM Francisco Arencibia Quesada <
> arencibia.francisco@xxxxxxxxx> wrote:
>
>> Thanks so much Adam, that worked great, however I can not add any storage
>> with:
>>
>> sudo cephadm ceph orch daemon add osd node2-ceph:/dev/nvme1n1
>>
>> root@node1-ceph:~# ceph status
>>   cluster:
>>     id:     9d8f1112-8ef9-11ee-838e-a74e679f7866
>>     health: HEALTH_WARN
>>             Failed to apply 1 service(s): osd.all-available-devices
>>             2 failed cephadm daemon(s)
>>             OSD count 0 < osd_pool_default_size 3
>>
>>   services:
>>     mon: 1 daemons, quorum node1-ceph (age 18m)
>>     mgr: node1-ceph.jitjfd(active, since 17m)
>>     osd: 0 osds: 0 up, 0 in (since 6m)
>>
>>   data:
>>     pools:   0 pools, 0 pgs
>>     objects: 0 objects, 0 B
>>     usage:   0 B used, 0 B / 0 B avail
>>     pgs:
>>
>> root@node1-ceph:~#
>>
>> Regards
>>
>>
>>
>> On Wed, Nov 29, 2023 at 5:45 PM Adam King <adking@xxxxxxxxxx> wrote:
>>
>>> I think I remember a bug that happened when there was a small mismatch
>>> between the cephadm version being used for bootstrapping and the container.
>>> In this case, the cephadm binary used for bootstrap knows about the
>>> ceph-exporter service and the container image being used does not. The
>>> ceph-exporter was removed from quincy between 17.2.6 and 17.2.7 so I'd
>>> guess the cephadm binary here is a bit older and it's pulling hte 17.2.7
>>> image. For now, I'd say just workaround this by running bootstrap with
>>> `--skip-monitoring-stack` flag. If you want the other services in the
>>> monitoring stack after bootstrap you can just run `ceph orch apply
>>> <service>` for services alertmanager, prometheus, node-exporter, and
>>> grafana and it would get you in the same spot as if you didn't provide the
>>> flag and weren't hitting the issue.
>>>
>>> For an extra note, this failed bootstrap might be leaving things around
>>> that could cause subsequent bootstraps to fail. If you run `cephadm ls` and
>>> see things listed, you can grab the fsid from the output of that command
>>> and run `cephadm rm-cluster --force --fsid <fsid>` to clean up the env
>>> before bootstrapping again.
>>>
>>> On Wed, Nov 29, 2023 at 11:32 AM Francisco Arencibia Quesada <
>>> arencibia.francisco@xxxxxxxxx> wrote:
>>>
>>>> Hello guys,
>>>>
>>>> This situation is driving me crazy, I have tried to deploy a ceph
>>>> cluster,
>>>> in all ways possible, even with ansible and at some point it breaks. I'm
>>>> using Ubuntu 22.0.4.  This is one of the errors I'm having, some problem
>>>> with ceph-exporter.  Please could you help me, I have been dealing with
>>>> this for like 5 days.
>>>> Kind regards
>>>>
>>>>  root@node1-ceph:~# cephadm bootstrap --mon-ip 10.0.0.52
>>>> Verifying podman|docker is present...
>>>> Verifying lvm2 is present...
>>>> Verifying time synchronization is in place...
>>>> Unit systemd-timesyncd.service is enabled and running
>>>> Repeating the final host check...
>>>> docker (/usr/bin/docker) is present
>>>> systemctl is present
>>>> lvcreate is present
>>>> Unit systemd-timesyncd.service is enabled and running
>>>> Host looks OK
>>>> Cluster fsid: 4ce3a92a-8ddd-11ee-9b23-6341187f70c1
>>>> Verifying IP 10.0.0.52 port 3300 ...
>>>> Verifying IP 10.0.0.52 port 6789 ...
>>>> Mon IP `10.0.0.52` is in CIDR network `10.0.0.0/24`
>>>> <http://10.0.0.0/24>
>>>> Mon IP `10.0.0.52` is in CIDR network `10.0.0.0/24`
>>>> <http://10.0.0.0/24>
>>>> Mon IP `10.0.0.52` is in CIDR network `10.0.0.1/32`
>>>> <http://10.0.0.1/32>
>>>> Mon IP `10.0.0.52` is in CIDR network `10.0.0.1/32`
>>>> <http://10.0.0.1/32>
>>>> Internal network (--cluster-network) has not been provided, OSD
>>>> replication
>>>> will default to the public_network
>>>> Pulling container image quay.io/ceph/ceph:v17...
>>>> Ceph version: ceph version 17.2.7
>>>> (b12291d110049b2f35e32e0de30d70e9a4c060d2) quincy (stable)
>>>> Extracting ceph user uid/gid from container image...
>>>> Creating initial keys...
>>>> Creating initial monmap...
>>>> Creating mon...
>>>> Waiting for mon to start...
>>>> Waiting for mon...
>>>> mon is available
>>>> Assimilating anything we can from ceph.conf...
>>>> Generating new minimal ceph.conf...
>>>> Restarting the monitor...
>>>> Setting mon public_network to 10.0.0.1/32,10.0.0.0/24
>>>> Wrote config to /etc/ceph/ceph.conf
>>>> Wrote keyring to /etc/ceph/ceph.client.admin.keyring
>>>> Creating mgr...
>>>> Verifying port 9283 ...
>>>> Waiting for mgr to start...
>>>> Waiting for mgr...
>>>> mgr not available, waiting (1/15)...
>>>> mgr not available, waiting (2/15)...
>>>> mgr not available, waiting (3/15)...
>>>> mgr not available, waiting (4/15)...
>>>> mgr not available, waiting (5/15)...
>>>> mgr is available
>>>> Enabling cephadm module...
>>>> Waiting for the mgr to restart...
>>>> Waiting for mgr epoch 5...
>>>> mgr epoch 5 is available
>>>> Setting orchestrator backend to cephadm...
>>>> Generating ssh key...
>>>> Wrote public SSH key to /etc/ceph/ceph.pub
>>>> Adding key to root@localhost authorized_keys...
>>>> Adding host node1-ceph...
>>>> Deploying mon service with default placement...
>>>> Deploying mgr service with default placement...
>>>> Deploying crash service with default placement...
>>>> Deploying ceph-exporter service with default placement...
>>>> Non-zero exit code 22 from /usr/bin/docker run --rm --ipc=host
>>>> --stop-signal=SIGTERM --net=host --entrypoint /usr/bin/ceph --init -e
>>>> CONTAINER_IMAGE=quay.io/ceph/ceph:v17 -e NODE_NAME=node1-ceph -e
>>>> CEPH_USE_RANDOM_NONCE=1 -v
>>>> /var/log/ceph/4ce3a92a-8ddd-11ee-9b23-6341187f70c1:/var/log/ceph:z -v
>>>> /tmp/ceph-tmp6yz3vt5s:/etc/ceph/ceph.client.admin.keyring:z -v
>>>> /tmp/ceph-tmpfhd01qwu:/etc/ceph/ceph.conf:z quay.io/ceph/ceph:v17 orch
>>>> apply ceph-exporter
>>>> /usr/bin/ceph: stderr Error EINVAL: Usage:
>>>> /usr/bin/ceph: stderr   ceph orch apply -i <yaml spec> [--dry-run]
>>>> /usr/bin/ceph: stderr   ceph orch apply <service_type>
>>>> [--placement=<placement_string>] [--unmanaged]
>>>> /usr/bin/ceph: stderr
>>>> Traceback (most recent call last):
>>>>   File "/usr/sbin/cephadm", line 9653, in <module>
>>>>     main()
>>>>   File "/usr/sbin/cephadm", line 9641, in main
>>>>     r = ctx.func(ctx)
>>>>   File "/usr/sbin/cephadm", line 2205, in _default_image
>>>>     return func(ctx)
>>>>   File "/usr/sbin/cephadm", line 5774, in command_bootstrap
>>>>     prepare_ssh(ctx, cli, wait_for_mgr_restart)
>>>>   File "/usr/sbin/cephadm", line 5275, in prepare_ssh
>>>>     cli(['orch', 'apply', t])
>>>>   File "/usr/sbin/cephadm", line 5708, in cli
>>>>     return CephContainer(
>>>>   File "/usr/sbin/cephadm", line 4144, in run
>>>>     out, _, _ = call_throws(self.ctx, self.run_cmd(),
>>>>   File "/usr/sbin/cephadm", line 1853, in call_throws
>>>>     raise RuntimeError('Failed command: %s' % ' '.join(command))
>>>> RuntimeError: Failed command: /usr/bin/docker run --rm --ipc=host
>>>> --stop-signal=SIGTERM --net=host --entrypoint /usr/bin/ceph --init -e
>>>> CONTAINER_IMAGE=quay.io/ceph/ceph:v17 -e NODE_NAME=node1-ceph -e
>>>> CEPH_USE_RANDOM_NONCE=1 -v
>>>> /var/log/ceph/4ce3a92a-8ddd-11ee-9b23-6341187f70c1:/var/log/ceph:z -v
>>>> /tmp/ceph-tmp6yz3vt5s:/etc/ceph/ceph.client.admin.keyring:z -v
>>>> /tmp/ceph-tmpfhd01qwu:/etc/ceph/ceph.conf:z quay.io/ceph/ceph:v17 orch
>>>> apply ceph-exporter
>>>>
>>>> --
>>>> *Francisco Arencibia Quesada.*
>>>> *DevOps Engineer*
>>>> _______________________________________________
>>>> ceph-users mailing list -- ceph-users@xxxxxxx
>>>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>>>>
>>>>
>>
>> --
>> *Francisco Arencibia Quesada.*
>> *DevOps Engineer*
>>
>

-- 
*Francisco Arencibia Quesada.*
*DevOps Engineer*
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx