In the end it looks like I might be able to get the node up to about 30 odds before it stops creating any more. Or more it formats the disks but freezes up starting the daemons. I suspect I'm missing somthing I can tune to get it working better. If I could see any error messages that might help, but I'm yet to spit anything. Peter. On Wed, 26 May 2021, 10:57 Eugen Block, <eblock@xxxxxx> wrote: > > If I add the osd daemons one at a time with > > > > ceph orch daemon add osd drywood12:/dev/sda > > > > It does actually work, > > Great! > > > I suspect what's happening is when my rule for creating osds run and > > creates them all-at-once it ties the orch it overloads cephadm and it > can't > > cope. > > It's possible, I guess. > > > I suspect what I might need to do at least to work around the issue is > set > > "limit:" and bring it up until it stops working. > > It's worth a try, yes, although the docs state you should try to avoid > it, it's possible that it doesn't work properly, in that case create a > bug report. ;-) > > > I did work out how to get ceph-volume to nearly work manually. > > > > cephadm shell > > ceph auth get client.bootstrap-osd -o > > /var/lib/ceph/bootstrap-osd/ceph.keyring > > ceph-volume lvm create --data /dev/sda --dmcrypt > > > > but given I've now got "add osd" to work, I suspect I just need to fine > > tune my osd creation rules, so it does not try and create too many osds > on > > the same node at the same time. > > I agree, no need to do it manually if there is an automated way, > especially if you're trying to bring up dozens of OSDs. > > > Zitat von Peter Childs <pchilds@xxxxxxx>: > > > After a bit of messing around. I managed to get it somewhat working. > > > > If I add the osd daemons one at a time with > > > > ceph orch daemon add osd drywood12:/dev/sda > > > > It does actually work, > > > > I suspect what's happening is when my rule for creating osds run and > > creates them all-at-once it ties the orch it overloads cephadm and it > can't > > cope. > > > > service_type: osd > > service_name: osd.drywood-disks > > placement: > > host_pattern: 'drywood*' > > spec: > > data_devices: > > size: "7TB:" > > objectstore: bluestore > > > > I suspect what I might need to do at least to work around the issue is > set > > "limit:" and bring it up until it stops working. > > > > I did work out how to get ceph-volume to nearly work manually. > > > > cephadm shell > > ceph auth get client.bootstrap-osd -o > > /var/lib/ceph/bootstrap-osd/ceph.keyring > > ceph-volume lvm create --data /dev/sda --dmcrypt > > > > but given I've now got "add osd" to work, I suspect I just need to fine > > tune my osd creation rules, so it does not try and create too many osds > on > > the same node at the same time. > > > > > > > > On Wed, 26 May 2021 at 08:25, Eugen Block <eblock@xxxxxx> wrote: > > > >> Hi, > >> > >> I believe your current issue is due to a missing keyring for > >> client.bootstrap-osd on the OSD node. But even after fixing that > >> you'll probably still won't be able to deploy an OSD manually with > >> ceph-volume because 'ceph-volume activate' is not supported with > >> cephadm [1]. I just tried that in a virtual environment, it fails when > >> activating the systemd-unit: > >> > >> ---snip--- > >> [2021-05-26 06:47:16,677][ceph_volume.process][INFO ] Running > >> command: /usr/bin/systemctl enable > >> ceph-volume@lvm-8-1a8fc8ae-8f4c-4f91-b044-d5636bb52456 > >> [2021-05-26 06:47:16,692][ceph_volume.process][INFO ] stderr Failed > >> to connect to bus: No such file or directory > >> [2021-05-26 06:47:16,693][ceph_volume.devices.lvm.create][ERROR ] lvm > >> activate was unable to complete, while creating the OSD > >> Traceback (most recent call last): > >> File > >> "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/create.py", > >> line 32, in create > >> Activate([]).activate(args) > >> File "/usr/lib/python3.6/site-packages/ceph_volume/decorators.py", > >> line 16, in is_root > >> return func(*a, **kw) > >> File > >> "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/activate.py", > >> line > >> 294, in activate > >> activate_bluestore(lvs, args.no_systemd) > >> File > >> "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/activate.py", > >> line > >> 214, in activate_bluestore > >> systemctl.enable_volume(osd_id, osd_fsid, 'lvm') > >> File > >> "/usr/lib/python3.6/site-packages/ceph_volume/systemd/systemctl.py", > >> line 82, in enable_volume > >> return enable(volume_unit % (device_type, id_, fsid)) > >> File > >> "/usr/lib/python3.6/site-packages/ceph_volume/systemd/systemctl.py", > >> line 22, in enable > >> process.run(['systemctl', 'enable', unit]) > >> File "/usr/lib/python3.6/site-packages/ceph_volume/process.py", > >> line 153, in run > >> raise RuntimeError(msg) > >> RuntimeError: command returned non-zero exit status: 1 > >> [2021-05-26 06:47:16,694][ceph_volume.devices.lvm.create][INFO ] will > >> rollback OSD ID creation > >> [2021-05-26 06:47:16,697][ceph_volume.process][INFO ] Running > >> command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd > >> --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring osd purge-new osd.8 > >> --yes-i-really-mean-it > >> [2021-05-26 06:47:17,597][ceph_volume.process][INFO ] stderr purged > osd.8 > >> ---snip--- > >> > >> There's a workaround described in [2] that's not really an option for > >> dozens of OSDs. I think your best approach is to bring cephadm to > >> activate the OSDs for you. > >> You wrote you didn't find any helpful error messages, but did cephadm > >> even try to deploy OSDs? What does your osd spec file look like? Did > >> you explicitly run 'ceph orch apply osd -i specfile.yml'? This should > >> trigger cephadm and you should see at least some output like this: > >> > >> Mai 26 08:21:48 pacific1 conmon[31446]: 2021-05-26T06:21:48.466+0000 > >> 7effc15ff700 0 log_channel(cephadm) log [INF] : Applying service > >> osd.ssd-hdd-mix on host pacific2... > >> Mai 26 08:21:49 pacific1 conmon[31009]: cephadm > >> 2021-05-26T06:21:48.469611+0000 mgr.pacific1.whndiw (mgr.14166) 1646 : > >> cephadm [INF] Applying service osd.ssd-hdd-mix on host pacific2... > >> > >> Regards, > >> Eugen > >> > >> [1] https://tracker.ceph.com/issues/49159 > >> [2] https://tracker.ceph.com/issues/46691 > >> > >> > >> Zitat von Peter Childs <pchilds@xxxxxxx>: > >> > >> > Not sure what I'm doing wrong, I suspect its the way I'm running > >> > ceph-volume. > >> > > >> > root@drywood12:~# cephadm ceph-volume lvm create --data /dev/sda > >> --dmcrypt > >> > Inferring fsid 1518c8e0-bbe4-11eb-9772-001e67dc85ea > >> > Using recent ceph image ceph/ceph@sha256 > >> > :54e95ae1e11404157d7b329d0bef866ebbb214b195a009e87aae4eba9d282949 > >> > /usr/bin/docker: Running command: /usr/bin/ceph-authtool > --gen-print-key > >> > /usr/bin/docker: Running command: /usr/bin/ceph-authtool > --gen-print-key > >> > /usr/bin/docker: --> RuntimeError: No valid ceph configuration file > was > >> > loaded. > >> > Traceback (most recent call last): > >> > File "/usr/sbin/cephadm", line 8029, in <module> > >> > main() > >> > File "/usr/sbin/cephadm", line 8017, in main > >> > r = ctx.func(ctx) > >> > File "/usr/sbin/cephadm", line 1678, in _infer_fsid > >> > return func(ctx) > >> > File "/usr/sbin/cephadm", line 1738, in _infer_image > >> > return func(ctx) > >> > File "/usr/sbin/cephadm", line 4514, in command_ceph_volume > >> > out, err, code = call_throws(ctx, c.run_cmd(), > verbosity=verbosity) > >> > File "/usr/sbin/cephadm", line 1464, in call_throws > >> > raise RuntimeError('Failed command: %s' % ' '.join(command)) > >> > RuntimeError: Failed command: /usr/bin/docker run --rm --ipc=host > >> > --net=host --entrypoint /usr/sbin/ceph-volume --privileged > >> --group-add=disk > >> > --init -e CONTAINER_IMAGE=ceph/ceph@sha256:54e95ae1e11404157d7b329d0t > >> > > >> > root@drywood12:~# cephadm shell > >> > Inferring fsid 1518c8e0-bbe4-11eb-9772-001e67dc85ea > >> > Inferring config > >> > > /var/lib/ceph/1518c8e0-bbe4-11eb-9772-001e67dc85ea/mon.drywood12/config > >> > Using recent ceph image ceph/ceph@sha256 > >> > :54e95ae1e11404157d7b329d0bef866ebbb214b195a009e87aae4eba9d282949 > >> > root@drywood12:/# ceph-volume lvm create --data /dev/sda --dmcrypt > >> > Running command: /usr/bin/ceph-authtool --gen-print-key > >> > Running command: /usr/bin/ceph-authtool --gen-print-key > >> > Running command: /usr/bin/ceph --cluster ceph --name > client.bootstrap-osd > >> > --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new > >> > 70054a5c-c176-463a-a0ac-b44c5db0987c > >> > stderr: 2021-05-25T07:46:18.188+0000 7fdef8f0d700 -1 auth: unable to > >> find > >> > a keyring on /var/lib/ceph/bootstrap-osd/ceph.keyring: (2) No such > file > >> or > >> > directory > >> > stderr: 2021-05-25T07:46:18.188+0000 7fdef8f0d700 -1 > >> > AuthRegistry(0x7fdef405b378) no keyring found at > >> > /var/lib/ceph/bootstrap-osd/ceph.keyring, disabling cephx > >> > stderr: 2021-05-25T07:46:18.188+0000 7fdef8f0d700 -1 auth: unable to > >> find > >> > a keyring on /var/lib/ceph/bootstrap-osd/ceph.keyring: (2) No such > file > >> or > >> > directory > >> > stderr: 2021-05-25T07:46:18.188+0000 7fdef8f0d700 -1 > >> > AuthRegistry(0x7fdef405ef20) no keyring found at > >> > /var/lib/ceph/bootstrap-osd/ceph.keyring, disabling cephx > >> > stderr: 2021-05-25T07:46:18.188+0000 7fdef8f0d700 -1 auth: unable to > >> find > >> > a keyring on /var/lib/ceph/bootstrap-osd/ceph.keyring: (2) No such > file > >> or > >> > directory > >> > stderr: 2021-05-25T07:46:18.188+0000 7fdef8f0d700 -1 > >> > AuthRegistry(0x7fdef8f0bea0) no keyring found at > >> > /var/lib/ceph/bootstrap-osd/ceph.keyring, disabling cephx > >> > stderr: 2021-05-25T07:46:18.188+0000 7fdef2d9d700 -1 > monclient(hunting): > >> > handle_auth_bad_method server allowed_methods [2] but i only support > [1] > >> > stderr: 2021-05-25T07:46:18.188+0000 7fdef259c700 -1 > monclient(hunting): > >> > handle_auth_bad_method server allowed_methods [2] but i only support > [1] > >> > stderr: 2021-05-25T07:46:18.188+0000 7fdef1d9b700 -1 > monclient(hunting): > >> > handle_auth_bad_method server allowed_methods [2] but i only support > [1] > >> > stderr: 2021-05-25T07:46:18.188+0000 7fdef8f0d700 -1 monclient: > >> > authenticate NOTE: no keyring found; disabled cephx authentication > >> > stderr: [errno 13] RADOS permission denied (error connecting to the > >> > cluster) > >> > --> RuntimeError: Unable to create a new OSD id > >> > root@drywood12:/# lsblk /dev/sda > >> > NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT > >> > sda 8:0 0 7.3T 0 disk > >> > > >> > As far as I can see cephadm gets a little further than this as the > disks > >> > have lvm volumes on them just the osd's daemons are not created or > >> started. > >> > So maybe I'm invoking ceph-volume incorrectly. > >> > > >> > > >> > On Tue, 25 May 2021 at 06:57, Peter Childs <pchilds@xxxxxxx> wrote: > >> > > >> >> > >> >> > >> >> On Mon, 24 May 2021, 21:08 Marc, <Marc@xxxxxxxxxxxxxxxxx> wrote: > >> >> > >> >>> > > >> >>> > I'm attempting to use cephadm and Pacific, currently on debian > >> buster, > >> >>> > mostly because centos7 ain't supported any more and cenotos8 ain't > >> >>> > support > >> >>> > by some of my hardware. > >> >>> > >> >>> Who says centos7 is not supported any more? Afaik centos7/el7 is > being > >> >>> supported till its EOL 2024. By then maybe a good alternative for > >> >>> el8/stream has surfaced. > >> >>> > >> >> > >> >> Not supported by ceph Pacific, it's our os of choice otherwise. > >> >> > >> >> My testing says the version available of podman, docker and python3, > do > >> >> not work with Pacific. > >> >> > >> >> Given I've needed to upgrade docker on buster can we please have a > list > >> of > >> >> versions that work with cephadm, maybe even have cephadm say no, > please > >> >> upgrade unless your running the right version or better. > >> >> > >> >> > >> >> > >> >>> > Anyway I have a few nodes with 59x 7.2TB disks but for some reason > >> the > >> >>> > osd > >> >>> > daemons don't start, the disks get formatted and the osd are > created > >> but > >> >>> > the daemons never come up. > >> >>> > >> >>> what if you try with > >> >>> ceph-volume lvm create --data /dev/sdi --dmcrypt ? > >> >>> > >> >> > >> >> I'll have a go. > >> >> > >> >> > >> >>> > They are probably the wrong spec for ceph (48gb of memory and > only 4 > >> >>> > cores) > >> >>> > >> >>> You can always start with just configuring a few disks per node. > That > >> >>> should always work. > >> >>> > >> >> > >> >> That was my thought too. > >> >> > >> >> Thanks > >> >> > >> >> Peter > >> >> > >> >> > >> >>> > but I was expecting them to start and be either dirt slow or crash > >> >>> > later, > >> >>> > anyway I've got upto 30 of them, so I was hoping on getting at > least > >> get > >> >>> > 6PB of raw storage out of them. > >> >>> > > >> >>> > As yet I've not spotted any helpful error messages. > >> >>> > > >> >>> _______________________________________________ > >> >>> ceph-users mailing list -- ceph-users@xxxxxxx > >> >>> To unsubscribe send an email to ceph-users-leave@xxxxxxx > >> >>> > >> >> > >> > _______________________________________________ > >> > ceph-users mailing list -- ceph-users@xxxxxxx > >> > To unsubscribe send an email to ceph-users-leave@xxxxxxx > >> > >> > >> _______________________________________________ > >> ceph-users mailing list -- ceph-users@xxxxxxx > >> To unsubscribe send an email to ceph-users-leave@xxxxxxx > >> > > > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx