Re: Ceph osd will not start.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

I believe your current issue is due to a missing keyring for client.bootstrap-osd on the OSD node. But even after fixing that you'll probably still won't be able to deploy an OSD manually with ceph-volume because 'ceph-volume activate' is not supported with cephadm [1]. I just tried that in a virtual environment, it fails when activating the systemd-unit:

---snip---
[2021-05-26 06:47:16,677][ceph_volume.process][INFO ] Running command: /usr/bin/systemctl enable ceph-volume@lvm-8-1a8fc8ae-8f4c-4f91-b044-d5636bb52456 [2021-05-26 06:47:16,692][ceph_volume.process][INFO ] stderr Failed to connect to bus: No such file or directory [2021-05-26 06:47:16,693][ceph_volume.devices.lvm.create][ERROR ] lvm activate was unable to complete, while creating the OSD
Traceback (most recent call last):
File "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/create.py", line 32, in create
    Activate([]).activate(args)
File "/usr/lib/python3.6/site-packages/ceph_volume/decorators.py", line 16, in is_root
    return func(*a, **kw)
File "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/activate.py", line 294, in activate
    activate_bluestore(lvs, args.no_systemd)
File "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/activate.py", line 214, in activate_bluestore
    systemctl.enable_volume(osd_id, osd_fsid, 'lvm')
File "/usr/lib/python3.6/site-packages/ceph_volume/systemd/systemctl.py", line 82, in enable_volume
    return enable(volume_unit % (device_type, id_, fsid))
File "/usr/lib/python3.6/site-packages/ceph_volume/systemd/systemctl.py", line 22, in enable
    process.run(['systemctl', 'enable', unit])
File "/usr/lib/python3.6/site-packages/ceph_volume/process.py", line 153, in run
    raise RuntimeError(msg)
RuntimeError: command returned non-zero exit status: 1
[2021-05-26 06:47:16,694][ceph_volume.devices.lvm.create][INFO ] will rollback OSD ID creation [2021-05-26 06:47:16,697][ceph_volume.process][INFO ] Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring osd purge-new osd.8 --yes-i-really-mean-it
[2021-05-26 06:47:17,597][ceph_volume.process][INFO  ] stderr purged osd.8
---snip---

There's a workaround described in [2] that's not really an option for dozens of OSDs. I think your best approach is to bring cephadm to activate the OSDs for you. You wrote you didn't find any helpful error messages, but did cephadm even try to deploy OSDs? What does your osd spec file look like? Did you explicitly run 'ceph orch apply osd -i specfile.yml'? This should trigger cephadm and you should see at least some output like this:

Mai 26 08:21:48 pacific1 conmon[31446]: 2021-05-26T06:21:48.466+0000 7effc15ff700 0 log_channel(cephadm) log [INF] : Applying service osd.ssd-hdd-mix on host pacific2... Mai 26 08:21:49 pacific1 conmon[31009]: cephadm 2021-05-26T06:21:48.469611+0000 mgr.pacific1.whndiw (mgr.14166) 1646 : cephadm [INF] Applying service osd.ssd-hdd-mix on host pacific2...

Regards,
Eugen

[1] https://tracker.ceph.com/issues/49159
[2] https://tracker.ceph.com/issues/46691


Zitat von Peter Childs <pchilds@xxxxxxx>:

Not sure what I'm doing wrong, I suspect its the way I'm running
ceph-volume.

root@drywood12:~# cephadm ceph-volume lvm create --data /dev/sda --dmcrypt
Inferring fsid 1518c8e0-bbe4-11eb-9772-001e67dc85ea
Using recent ceph image ceph/ceph@sha256
:54e95ae1e11404157d7b329d0bef866ebbb214b195a009e87aae4eba9d282949
/usr/bin/docker: Running command: /usr/bin/ceph-authtool --gen-print-key
/usr/bin/docker: Running command: /usr/bin/ceph-authtool --gen-print-key
/usr/bin/docker: -->  RuntimeError: No valid ceph configuration file was
loaded.
Traceback (most recent call last):
  File "/usr/sbin/cephadm", line 8029, in <module>
    main()
  File "/usr/sbin/cephadm", line 8017, in main
    r = ctx.func(ctx)
  File "/usr/sbin/cephadm", line 1678, in _infer_fsid
    return func(ctx)
  File "/usr/sbin/cephadm", line 1738, in _infer_image
    return func(ctx)
  File "/usr/sbin/cephadm", line 4514, in command_ceph_volume
    out, err, code = call_throws(ctx, c.run_cmd(), verbosity=verbosity)
  File "/usr/sbin/cephadm", line 1464, in call_throws
    raise RuntimeError('Failed command: %s' % ' '.join(command))
RuntimeError: Failed command: /usr/bin/docker run --rm --ipc=host
--net=host --entrypoint /usr/sbin/ceph-volume --privileged --group-add=disk
--init -e CONTAINER_IMAGE=ceph/ceph@sha256:54e95ae1e11404157d7b329d0t

root@drywood12:~# cephadm shell
Inferring fsid 1518c8e0-bbe4-11eb-9772-001e67dc85ea
Inferring config
/var/lib/ceph/1518c8e0-bbe4-11eb-9772-001e67dc85ea/mon.drywood12/config
Using recent ceph image ceph/ceph@sha256
:54e95ae1e11404157d7b329d0bef866ebbb214b195a009e87aae4eba9d282949
root@drywood12:/# ceph-volume lvm create --data /dev/sda --dmcrypt
Running command: /usr/bin/ceph-authtool --gen-print-key
Running command: /usr/bin/ceph-authtool --gen-print-key
Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd
--keyring /var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new
70054a5c-c176-463a-a0ac-b44c5db0987c
 stderr: 2021-05-25T07:46:18.188+0000 7fdef8f0d700 -1 auth: unable to find
a keyring on /var/lib/ceph/bootstrap-osd/ceph.keyring: (2) No such file or
directory
 stderr: 2021-05-25T07:46:18.188+0000 7fdef8f0d700 -1
AuthRegistry(0x7fdef405b378) no keyring found at
/var/lib/ceph/bootstrap-osd/ceph.keyring, disabling cephx
 stderr: 2021-05-25T07:46:18.188+0000 7fdef8f0d700 -1 auth: unable to find
a keyring on /var/lib/ceph/bootstrap-osd/ceph.keyring: (2) No such file or
directory
 stderr: 2021-05-25T07:46:18.188+0000 7fdef8f0d700 -1
AuthRegistry(0x7fdef405ef20) no keyring found at
/var/lib/ceph/bootstrap-osd/ceph.keyring, disabling cephx
 stderr: 2021-05-25T07:46:18.188+0000 7fdef8f0d700 -1 auth: unable to find
a keyring on /var/lib/ceph/bootstrap-osd/ceph.keyring: (2) No such file or
directory
 stderr: 2021-05-25T07:46:18.188+0000 7fdef8f0d700 -1
AuthRegistry(0x7fdef8f0bea0) no keyring found at
/var/lib/ceph/bootstrap-osd/ceph.keyring, disabling cephx
 stderr: 2021-05-25T07:46:18.188+0000 7fdef2d9d700 -1 monclient(hunting):
handle_auth_bad_method server allowed_methods [2] but i only support [1]
 stderr: 2021-05-25T07:46:18.188+0000 7fdef259c700 -1 monclient(hunting):
handle_auth_bad_method server allowed_methods [2] but i only support [1]
 stderr: 2021-05-25T07:46:18.188+0000 7fdef1d9b700 -1 monclient(hunting):
handle_auth_bad_method server allowed_methods [2] but i only support [1]
 stderr: 2021-05-25T07:46:18.188+0000 7fdef8f0d700 -1 monclient:
authenticate NOTE: no keyring found; disabled cephx authentication
 stderr: [errno 13] RADOS permission denied (error connecting to the
cluster)
-->  RuntimeError: Unable to create a new OSD id
root@drywood12:/# lsblk /dev/sda
NAME MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sda    8:0    0  7.3T  0 disk

As far as I can see cephadm gets a little further than this as the disks
have lvm volumes on them just the osd's daemons are not created or started.
So maybe I'm invoking ceph-volume incorrectly.


On Tue, 25 May 2021 at 06:57, Peter Childs <pchilds@xxxxxxx> wrote:



On Mon, 24 May 2021, 21:08 Marc, <Marc@xxxxxxxxxxxxxxxxx> wrote:

>
> I'm attempting to use cephadm and Pacific, currently on debian buster,
> mostly because centos7 ain't supported any more and cenotos8 ain't
> support
> by some of my hardware.

Who says centos7 is not supported any more? Afaik centos7/el7 is being
supported till its EOL 2024. By then maybe a good alternative for
el8/stream has surfaced.


Not supported by ceph Pacific, it's our os of choice otherwise.

My testing says the version available of podman, docker and python3, do
not work with Pacific.

Given I've needed to upgrade docker on buster can we please have a list of
versions that work with cephadm, maybe even have cephadm say no, please
upgrade unless your running the right version or better.



> Anyway I have a few nodes with 59x 7.2TB disks but for some reason the
> osd
> daemons don't start, the disks get formatted and the osd are created but
> the daemons never come up.

what if you try with
ceph-volume lvm create --data /dev/sdi --dmcrypt ?


I'll have a go.


> They are probably the wrong spec for ceph (48gb of memory and only 4
> cores)

You can always start with just configuring a few disks per node. That
should always work.


That was my thought too.

Thanks

Peter


> but I was expecting them to start and be either dirt slow or crash
> later,
> anyway I've got upto 30 of them, so I was hoping on getting at least get
> 6PB of raw storage out of them.
>
> As yet I've not spotted any helpful error messages.
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx


_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx


_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux