Re: Ceph osd will not start.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



In the end it looks like I might be able to get the node up to about 30
odds before it stops creating any more.

Or more it formats the disks but freezes up starting the daemons.

I suspect I'm missing somthing I can tune to get it working better.

If I could see any error messages that might help, but I'm yet to spit
anything.

Peter.

On Wed, 26 May 2021, 10:57 Eugen Block, <eblock@xxxxxx> wrote:

> > If I add the osd daemons one at a time with
> >
> > ceph orch daemon add osd drywood12:/dev/sda
> >
> > It does actually work,
>
> Great!
>
> > I suspect what's happening is when my rule for creating osds run and
> > creates them all-at-once it ties the orch it overloads cephadm and it
> can't
> > cope.
>
> It's possible, I guess.
>
> > I suspect what I might need to do at least to work around the issue is
> set
> > "limit:" and bring it up until it stops working.
>
> It's worth a try, yes, although the docs state you should try to avoid
> it, it's possible that it doesn't work properly, in that case create a
> bug report. ;-)
>
> > I did work out how to get ceph-volume to nearly work manually.
> >
> > cephadm shell
> > ceph auth get client.bootstrap-osd -o
> > /var/lib/ceph/bootstrap-osd/ceph.keyring
> > ceph-volume lvm create --data /dev/sda --dmcrypt
> >
> > but given I've now got "add osd" to work, I suspect I just need to fine
> > tune my osd creation rules, so it does not try and create too many osds
> on
> > the same node at the same time.
>
> I agree, no need to do it manually if there is an automated way,
> especially if you're trying to bring up dozens of OSDs.
>
>
> Zitat von Peter Childs <pchilds@xxxxxxx>:
>
> > After a bit of messing around. I managed to get it somewhat working.
> >
> > If I add the osd daemons one at a time with
> >
> > ceph orch daemon add osd drywood12:/dev/sda
> >
> > It does actually work,
> >
> > I suspect what's happening is when my rule for creating osds run and
> > creates them all-at-once it ties the orch it overloads cephadm and it
> can't
> > cope.
> >
> > service_type: osd
> > service_name: osd.drywood-disks
> > placement:
> >   host_pattern: 'drywood*'
> > spec:
> >   data_devices:
> >     size: "7TB:"
> >   objectstore: bluestore
> >
> > I suspect what I might need to do at least to work around the issue is
> set
> > "limit:" and bring it up until it stops working.
> >
> > I did work out how to get ceph-volume to nearly work manually.
> >
> > cephadm shell
> > ceph auth get client.bootstrap-osd -o
> > /var/lib/ceph/bootstrap-osd/ceph.keyring
> > ceph-volume lvm create --data /dev/sda --dmcrypt
> >
> > but given I've now got "add osd" to work, I suspect I just need to fine
> > tune my osd creation rules, so it does not try and create too many osds
> on
> > the same node at the same time.
> >
> >
> >
> > On Wed, 26 May 2021 at 08:25, Eugen Block <eblock@xxxxxx> wrote:
> >
> >> Hi,
> >>
> >> I believe your current issue is due to a missing keyring for
> >> client.bootstrap-osd on the OSD node. But even after fixing that
> >> you'll probably still won't be able to deploy an OSD manually with
> >> ceph-volume because 'ceph-volume activate' is not supported with
> >> cephadm [1]. I just tried that in a virtual environment, it fails when
> >> activating the systemd-unit:
> >>
> >> ---snip---
> >> [2021-05-26 06:47:16,677][ceph_volume.process][INFO  ] Running
> >> command: /usr/bin/systemctl enable
> >> ceph-volume@lvm-8-1a8fc8ae-8f4c-4f91-b044-d5636bb52456
> >> [2021-05-26 06:47:16,692][ceph_volume.process][INFO  ] stderr Failed
> >> to connect to bus: No such file or directory
> >> [2021-05-26 06:47:16,693][ceph_volume.devices.lvm.create][ERROR ] lvm
> >> activate was unable to complete, while creating the OSD
> >> Traceback (most recent call last):
> >>    File
> >> "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/create.py",
> >> line 32, in create
> >>      Activate([]).activate(args)
> >>    File "/usr/lib/python3.6/site-packages/ceph_volume/decorators.py",
> >> line 16, in is_root
> >>      return func(*a, **kw)
> >>    File
> >> "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/activate.py",
> >> line
> >> 294, in activate
> >>      activate_bluestore(lvs, args.no_systemd)
> >>    File
> >> "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/activate.py",
> >> line
> >> 214, in activate_bluestore
> >>      systemctl.enable_volume(osd_id, osd_fsid, 'lvm')
> >>    File
> >> "/usr/lib/python3.6/site-packages/ceph_volume/systemd/systemctl.py",
> >> line 82, in enable_volume
> >>      return enable(volume_unit % (device_type, id_, fsid))
> >>    File
> >> "/usr/lib/python3.6/site-packages/ceph_volume/systemd/systemctl.py",
> >> line 22, in enable
> >>      process.run(['systemctl', 'enable', unit])
> >>    File "/usr/lib/python3.6/site-packages/ceph_volume/process.py",
> >> line 153, in run
> >>      raise RuntimeError(msg)
> >> RuntimeError: command returned non-zero exit status: 1
> >> [2021-05-26 06:47:16,694][ceph_volume.devices.lvm.create][INFO  ] will
> >> rollback OSD ID creation
> >> [2021-05-26 06:47:16,697][ceph_volume.process][INFO  ] Running
> >> command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd
> >> --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring osd purge-new osd.8
> >> --yes-i-really-mean-it
> >> [2021-05-26 06:47:17,597][ceph_volume.process][INFO  ] stderr purged
> osd.8
> >> ---snip---
> >>
> >> There's a workaround described in [2] that's not really an option for
> >> dozens of OSDs. I think your best approach is to bring cephadm to
> >> activate the OSDs for you.
> >> You wrote you didn't find any helpful error messages, but did cephadm
> >> even try to deploy OSDs? What does your osd spec file look like? Did
> >> you explicitly run 'ceph orch apply osd -i specfile.yml'? This should
> >> trigger cephadm and you should see at least some output like this:
> >>
> >> Mai 26 08:21:48 pacific1 conmon[31446]: 2021-05-26T06:21:48.466+0000
> >> 7effc15ff700  0 log_channel(cephadm) log [INF] : Applying service
> >> osd.ssd-hdd-mix on host pacific2...
> >> Mai 26 08:21:49 pacific1 conmon[31009]: cephadm
> >> 2021-05-26T06:21:48.469611+0000 mgr.pacific1.whndiw (mgr.14166) 1646 :
> >> cephadm [INF] Applying service osd.ssd-hdd-mix on host pacific2...
> >>
> >> Regards,
> >> Eugen
> >>
> >> [1] https://tracker.ceph.com/issues/49159
> >> [2] https://tracker.ceph.com/issues/46691
> >>
> >>
> >> Zitat von Peter Childs <pchilds@xxxxxxx>:
> >>
> >> > Not sure what I'm doing wrong, I suspect its the way I'm running
> >> > ceph-volume.
> >> >
> >> > root@drywood12:~# cephadm ceph-volume lvm create --data /dev/sda
> >> --dmcrypt
> >> > Inferring fsid 1518c8e0-bbe4-11eb-9772-001e67dc85ea
> >> > Using recent ceph image ceph/ceph@sha256
> >> > :54e95ae1e11404157d7b329d0bef866ebbb214b195a009e87aae4eba9d282949
> >> > /usr/bin/docker: Running command: /usr/bin/ceph-authtool
> --gen-print-key
> >> > /usr/bin/docker: Running command: /usr/bin/ceph-authtool
> --gen-print-key
> >> > /usr/bin/docker: -->  RuntimeError: No valid ceph configuration file
> was
> >> > loaded.
> >> > Traceback (most recent call last):
> >> >   File "/usr/sbin/cephadm", line 8029, in <module>
> >> >     main()
> >> >   File "/usr/sbin/cephadm", line 8017, in main
> >> >     r = ctx.func(ctx)
> >> >   File "/usr/sbin/cephadm", line 1678, in _infer_fsid
> >> >     return func(ctx)
> >> >   File "/usr/sbin/cephadm", line 1738, in _infer_image
> >> >     return func(ctx)
> >> >   File "/usr/sbin/cephadm", line 4514, in command_ceph_volume
> >> >     out, err, code = call_throws(ctx, c.run_cmd(),
> verbosity=verbosity)
> >> >   File "/usr/sbin/cephadm", line 1464, in call_throws
> >> >     raise RuntimeError('Failed command: %s' % ' '.join(command))
> >> > RuntimeError: Failed command: /usr/bin/docker run --rm --ipc=host
> >> > --net=host --entrypoint /usr/sbin/ceph-volume --privileged
> >> --group-add=disk
> >> > --init -e CONTAINER_IMAGE=ceph/ceph@sha256:54e95ae1e11404157d7b329d0t
> >> >
> >> > root@drywood12:~# cephadm shell
> >> > Inferring fsid 1518c8e0-bbe4-11eb-9772-001e67dc85ea
> >> > Inferring config
> >> >
> /var/lib/ceph/1518c8e0-bbe4-11eb-9772-001e67dc85ea/mon.drywood12/config
> >> > Using recent ceph image ceph/ceph@sha256
> >> > :54e95ae1e11404157d7b329d0bef866ebbb214b195a009e87aae4eba9d282949
> >> > root@drywood12:/# ceph-volume lvm create --data /dev/sda --dmcrypt
> >> > Running command: /usr/bin/ceph-authtool --gen-print-key
> >> > Running command: /usr/bin/ceph-authtool --gen-print-key
> >> > Running command: /usr/bin/ceph --cluster ceph --name
> client.bootstrap-osd
> >> > --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new
> >> > 70054a5c-c176-463a-a0ac-b44c5db0987c
> >> >  stderr: 2021-05-25T07:46:18.188+0000 7fdef8f0d700 -1 auth: unable to
> >> find
> >> > a keyring on /var/lib/ceph/bootstrap-osd/ceph.keyring: (2) No such
> file
> >> or
> >> > directory
> >> >  stderr: 2021-05-25T07:46:18.188+0000 7fdef8f0d700 -1
> >> > AuthRegistry(0x7fdef405b378) no keyring found at
> >> > /var/lib/ceph/bootstrap-osd/ceph.keyring, disabling cephx
> >> >  stderr: 2021-05-25T07:46:18.188+0000 7fdef8f0d700 -1 auth: unable to
> >> find
> >> > a keyring on /var/lib/ceph/bootstrap-osd/ceph.keyring: (2) No such
> file
> >> or
> >> > directory
> >> >  stderr: 2021-05-25T07:46:18.188+0000 7fdef8f0d700 -1
> >> > AuthRegistry(0x7fdef405ef20) no keyring found at
> >> > /var/lib/ceph/bootstrap-osd/ceph.keyring, disabling cephx
> >> >  stderr: 2021-05-25T07:46:18.188+0000 7fdef8f0d700 -1 auth: unable to
> >> find
> >> > a keyring on /var/lib/ceph/bootstrap-osd/ceph.keyring: (2) No such
> file
> >> or
> >> > directory
> >> >  stderr: 2021-05-25T07:46:18.188+0000 7fdef8f0d700 -1
> >> > AuthRegistry(0x7fdef8f0bea0) no keyring found at
> >> > /var/lib/ceph/bootstrap-osd/ceph.keyring, disabling cephx
> >> >  stderr: 2021-05-25T07:46:18.188+0000 7fdef2d9d700 -1
> monclient(hunting):
> >> > handle_auth_bad_method server allowed_methods [2] but i only support
> [1]
> >> >  stderr: 2021-05-25T07:46:18.188+0000 7fdef259c700 -1
> monclient(hunting):
> >> > handle_auth_bad_method server allowed_methods [2] but i only support
> [1]
> >> >  stderr: 2021-05-25T07:46:18.188+0000 7fdef1d9b700 -1
> monclient(hunting):
> >> > handle_auth_bad_method server allowed_methods [2] but i only support
> [1]
> >> >  stderr: 2021-05-25T07:46:18.188+0000 7fdef8f0d700 -1 monclient:
> >> > authenticate NOTE: no keyring found; disabled cephx authentication
> >> >  stderr: [errno 13] RADOS permission denied (error connecting to the
> >> > cluster)
> >> > -->  RuntimeError: Unable to create a new OSD id
> >> > root@drywood12:/# lsblk /dev/sda
> >> > NAME MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
> >> > sda    8:0    0  7.3T  0 disk
> >> >
> >> > As far as I can see cephadm gets a little further than this as the
> disks
> >> > have lvm volumes on them just the osd's daemons are not created or
> >> started.
> >> > So maybe I'm invoking ceph-volume incorrectly.
> >> >
> >> >
> >> > On Tue, 25 May 2021 at 06:57, Peter Childs <pchilds@xxxxxxx> wrote:
> >> >
> >> >>
> >> >>
> >> >> On Mon, 24 May 2021, 21:08 Marc, <Marc@xxxxxxxxxxxxxxxxx> wrote:
> >> >>
> >> >>> >
> >> >>> > I'm attempting to use cephadm and Pacific, currently on debian
> >> buster,
> >> >>> > mostly because centos7 ain't supported any more and cenotos8 ain't
> >> >>> > support
> >> >>> > by some of my hardware.
> >> >>>
> >> >>> Who says centos7 is not supported any more? Afaik centos7/el7 is
> being
> >> >>> supported till its EOL 2024. By then maybe a good alternative for
> >> >>> el8/stream has surfaced.
> >> >>>
> >> >>
> >> >> Not supported by ceph Pacific, it's our os of choice otherwise.
> >> >>
> >> >> My testing says the version available of podman, docker and python3,
> do
> >> >> not work with Pacific.
> >> >>
> >> >> Given I've needed to upgrade docker on buster can we please have a
> list
> >> of
> >> >> versions that work with cephadm, maybe even have cephadm say no,
> please
> >> >> upgrade unless your running the right version or better.
> >> >>
> >> >>
> >> >>
> >> >>> > Anyway I have a few nodes with 59x 7.2TB disks but for some reason
> >> the
> >> >>> > osd
> >> >>> > daemons don't start, the disks get formatted and the osd are
> created
> >> but
> >> >>> > the daemons never come up.
> >> >>>
> >> >>> what if you try with
> >> >>> ceph-volume lvm create --data /dev/sdi --dmcrypt ?
> >> >>>
> >> >>
> >> >> I'll have a go.
> >> >>
> >> >>
> >> >>> > They are probably the wrong spec for ceph (48gb of memory and
> only 4
> >> >>> > cores)
> >> >>>
> >> >>> You can always start with just configuring a few disks per node.
> That
> >> >>> should always work.
> >> >>>
> >> >>
> >> >> That was my thought too.
> >> >>
> >> >> Thanks
> >> >>
> >> >> Peter
> >> >>
> >> >>
> >> >>> > but I was expecting them to start and be either dirt slow or crash
> >> >>> > later,
> >> >>> > anyway I've got upto 30 of them, so I was hoping on getting at
> least
> >> get
> >> >>> > 6PB of raw storage out of them.
> >> >>> >
> >> >>> > As yet I've not spotted any helpful error messages.
> >> >>> >
> >> >>> _______________________________________________
> >> >>> ceph-users mailing list -- ceph-users@xxxxxxx
> >> >>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
> >> >>>
> >> >>
> >> > _______________________________________________
> >> > ceph-users mailing list -- ceph-users@xxxxxxx
> >> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
> >>
> >>
> >> _______________________________________________
> >> ceph-users mailing list -- ceph-users@xxxxxxx
> >> To unsubscribe send an email to ceph-users-leave@xxxxxxx
> >>
>
>
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux