Thanks David We will investigate the bugs as per your suggestion, and then will look to test with the custom image. Appreciate it. On Sat, May 29, 2021, 4:11 PM David Orman <ormandj@xxxxxxxxxxxx> wrote: > You may be running into the same issue we ran into (make sure to read > the first issue, there's a few mingled in there), for which we > submitted a patch: > > https://tracker.ceph.com/issues/50526 > https://github.com/alfredodeza/remoto/issues/62 > > If you're brave (YMMV, test first non-prod), we pushed an image with > the issue we encountered fixed as per above here: > https://hub.docker.com/repository/docker/ormandj/ceph/tags?page=1 . We > 'upgraded' to this when we encountered the mgr hanging on us after > updating ceph to v16 and experiencing this issue using: "ceph orch > upgrade start --image docker.io/ormandj/ceph:v16.2.3-mgrfix". I've not > tried to boostrap a new cluster with a custom image, and I don't know > when 16.2.4 will be released with this change (hopefully) integrated > as remoto accepted the patch upstream. > > I'm not sure if this is your exact issue, see the bug reports and see > if you see the lock/the behavior matches, if so - then it may help you > out. The only change in that image is that patch to remoto being > overlaid on the default 16.2.3 image. > > On Fri, May 28, 2021 at 1:15 PM Marco Pizzolo <marcopizzolo@xxxxxxxxx> > wrote: > > > > Peter, > > > > We're seeing the same issues as you are. We have 2 new hosts Intel(R) > > Xeon(R) Gold 6248R CPU @ 3.00GHz w/ 48 cores, 384GB RAM, and 60x 10TB SED > > drives and we have tried both 15.2.13 and 16.2.4 > > > > Cephadm does NOT properly deploy and activate OSDs on Ubuntu 20.04.2 with > > Docker. > > > > Seems to be a bug in Cephadm and a product regression, as we have 4 near > > identical nodes on Centos running Nautilus (240 x 10TB SED drives) and > had > > no problems. > > > > FWIW we had no luck yet with one-by-one OSD daemon additions through ceph > > orch either. We also reproduced the issue easily in a virtual lab using > > small virtual disks on a single ceph VM with 1 mon. > > > > We are now looking into whether we can get past this with a manual > buildout. > > > > If you, or anyone, has hit the same stumbling block and gotten past it, I > > would really appreciate some guidance. > > > > Thanks, > > Marco > > > > On Thu, May 27, 2021 at 2:23 PM Peter Childs <pchilds@xxxxxxx> wrote: > > > > > In the end it looks like I might be able to get the node up to about 30 > > > odds before it stops creating any more. > > > > > > Or more it formats the disks but freezes up starting the daemons. > > > > > > I suspect I'm missing somthing I can tune to get it working better. > > > > > > If I could see any error messages that might help, but I'm yet to spit > > > anything. > > > > > > Peter. > > > > > > On Wed, 26 May 2021, 10:57 Eugen Block, <eblock@xxxxxx> wrote: > > > > > > > > If I add the osd daemons one at a time with > > > > > > > > > > ceph orch daemon add osd drywood12:/dev/sda > > > > > > > > > > It does actually work, > > > > > > > > Great! > > > > > > > > > I suspect what's happening is when my rule for creating osds run > and > > > > > creates them all-at-once it ties the orch it overloads cephadm and > it > > > > can't > > > > > cope. > > > > > > > > It's possible, I guess. > > > > > > > > > I suspect what I might need to do at least to work around the > issue is > > > > set > > > > > "limit:" and bring it up until it stops working. > > > > > > > > It's worth a try, yes, although the docs state you should try to > avoid > > > > it, it's possible that it doesn't work properly, in that case create > a > > > > bug report. ;-) > > > > > > > > > I did work out how to get ceph-volume to nearly work manually. > > > > > > > > > > cephadm shell > > > > > ceph auth get client.bootstrap-osd -o > > > > > /var/lib/ceph/bootstrap-osd/ceph.keyring > > > > > ceph-volume lvm create --data /dev/sda --dmcrypt > > > > > > > > > > but given I've now got "add osd" to work, I suspect I just need to > fine > > > > > tune my osd creation rules, so it does not try and create too many > osds > > > > on > > > > > the same node at the same time. > > > > > > > > I agree, no need to do it manually if there is an automated way, > > > > especially if you're trying to bring up dozens of OSDs. > > > > > > > > > > > > Zitat von Peter Childs <pchilds@xxxxxxx>: > > > > > > > > > After a bit of messing around. I managed to get it somewhat > working. > > > > > > > > > > If I add the osd daemons one at a time with > > > > > > > > > > ceph orch daemon add osd drywood12:/dev/sda > > > > > > > > > > It does actually work, > > > > > > > > > > I suspect what's happening is when my rule for creating osds run > and > > > > > creates them all-at-once it ties the orch it overloads cephadm and > it > > > > can't > > > > > cope. > > > > > > > > > > service_type: osd > > > > > service_name: osd.drywood-disks > > > > > placement: > > > > > host_pattern: 'drywood*' > > > > > spec: > > > > > data_devices: > > > > > size: "7TB:" > > > > > objectstore: bluestore > > > > > > > > > > I suspect what I might need to do at least to work around the > issue is > > > > set > > > > > "limit:" and bring it up until it stops working. > > > > > > > > > > I did work out how to get ceph-volume to nearly work manually. > > > > > > > > > > cephadm shell > > > > > ceph auth get client.bootstrap-osd -o > > > > > /var/lib/ceph/bootstrap-osd/ceph.keyring > > > > > ceph-volume lvm create --data /dev/sda --dmcrypt > > > > > > > > > > but given I've now got "add osd" to work, I suspect I just need to > fine > > > > > tune my osd creation rules, so it does not try and create too many > osds > > > > on > > > > > the same node at the same time. > > > > > > > > > > > > > > > > > > > > On Wed, 26 May 2021 at 08:25, Eugen Block <eblock@xxxxxx> wrote: > > > > > > > > > >> Hi, > > > > >> > > > > >> I believe your current issue is due to a missing keyring for > > > > >> client.bootstrap-osd on the OSD node. But even after fixing that > > > > >> you'll probably still won't be able to deploy an OSD manually with > > > > >> ceph-volume because 'ceph-volume activate' is not supported with > > > > >> cephadm [1]. I just tried that in a virtual environment, it fails > when > > > > >> activating the systemd-unit: > > > > >> > > > > >> ---snip--- > > > > >> [2021-05-26 06:47:16,677][ceph_volume.process][INFO ] Running > > > > >> command: /usr/bin/systemctl enable > > > > >> ceph-volume@lvm-8-1a8fc8ae-8f4c-4f91-b044-d5636bb52456 > > > > >> [2021-05-26 06:47:16,692][ceph_volume.process][INFO ] stderr > Failed > > > > >> to connect to bus: No such file or directory > > > > >> [2021-05-26 06:47:16,693][ceph_volume.devices.lvm.create][ERROR ] > lvm > > > > >> activate was unable to complete, while creating the OSD > > > > >> Traceback (most recent call last): > > > > >> File > > > > >> > "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/create.py", > > > > >> line 32, in create > > > > >> Activate([]).activate(args) > > > > >> File > "/usr/lib/python3.6/site-packages/ceph_volume/decorators.py", > > > > >> line 16, in is_root > > > > >> return func(*a, **kw) > > > > >> File > > > > >> > > > "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/activate.py", > > > > >> line > > > > >> 294, in activate > > > > >> activate_bluestore(lvs, args.no_systemd) > > > > >> File > > > > >> > > > "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/activate.py", > > > > >> line > > > > >> 214, in activate_bluestore > > > > >> systemctl.enable_volume(osd_id, osd_fsid, 'lvm') > > > > >> File > > > > >> > "/usr/lib/python3.6/site-packages/ceph_volume/systemd/systemctl.py", > > > > >> line 82, in enable_volume > > > > >> return enable(volume_unit % (device_type, id_, fsid)) > > > > >> File > > > > >> > "/usr/lib/python3.6/site-packages/ceph_volume/systemd/systemctl.py", > > > > >> line 22, in enable > > > > >> process.run(['systemctl', 'enable', unit]) > > > > >> File "/usr/lib/python3.6/site-packages/ceph_volume/process.py", > > > > >> line 153, in run > > > > >> raise RuntimeError(msg) > > > > >> RuntimeError: command returned non-zero exit status: 1 > > > > >> [2021-05-26 06:47:16,694][ceph_volume.devices.lvm.create][INFO ] > will > > > > >> rollback OSD ID creation > > > > >> [2021-05-26 06:47:16,697][ceph_volume.process][INFO ] Running > > > > >> command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd > > > > >> --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring osd purge-new > osd.8 > > > > >> --yes-i-really-mean-it > > > > >> [2021-05-26 06:47:17,597][ceph_volume.process][INFO ] stderr > purged > > > > osd.8 > > > > >> ---snip--- > > > > >> > > > > >> There's a workaround described in [2] that's not really an option > for > > > > >> dozens of OSDs. I think your best approach is to bring cephadm to > > > > >> activate the OSDs for you. > > > > >> You wrote you didn't find any helpful error messages, but did > cephadm > > > > >> even try to deploy OSDs? What does your osd spec file look like? > Did > > > > >> you explicitly run 'ceph orch apply osd -i specfile.yml'? This > should > > > > >> trigger cephadm and you should see at least some output like this: > > > > >> > > > > >> Mai 26 08:21:48 pacific1 conmon[31446]: > 2021-05-26T06:21:48.466+0000 > > > > >> 7effc15ff700 0 log_channel(cephadm) log [INF] : Applying service > > > > >> osd.ssd-hdd-mix on host pacific2... > > > > >> Mai 26 08:21:49 pacific1 conmon[31009]: cephadm > > > > >> 2021-05-26T06:21:48.469611+0000 mgr.pacific1.whndiw (mgr.14166) > 1646 : > > > > >> cephadm [INF] Applying service osd.ssd-hdd-mix on host pacific2... > > > > >> > > > > >> Regards, > > > > >> Eugen > > > > >> > > > > >> [1] https://tracker.ceph.com/issues/49159 > > > > >> [2] https://tracker.ceph.com/issues/46691 > > > > >> > > > > >> > > > > >> Zitat von Peter Childs <pchilds@xxxxxxx>: > > > > >> > > > > >> > Not sure what I'm doing wrong, I suspect its the way I'm running > > > > >> > ceph-volume. > > > > >> > > > > > >> > root@drywood12:~# cephadm ceph-volume lvm create --data > /dev/sda > > > > >> --dmcrypt > > > > >> > Inferring fsid 1518c8e0-bbe4-11eb-9772-001e67dc85ea > > > > >> > Using recent ceph image ceph/ceph@sha256 > > > > >> > > :54e95ae1e11404157d7b329d0bef866ebbb214b195a009e87aae4eba9d282949 > > > > >> > /usr/bin/docker: Running command: /usr/bin/ceph-authtool > > > > --gen-print-key > > > > >> > /usr/bin/docker: Running command: /usr/bin/ceph-authtool > > > > --gen-print-key > > > > >> > /usr/bin/docker: --> RuntimeError: No valid ceph configuration > file > > > > was > > > > >> > loaded. > > > > >> > Traceback (most recent call last): > > > > >> > File "/usr/sbin/cephadm", line 8029, in <module> > > > > >> > main() > > > > >> > File "/usr/sbin/cephadm", line 8017, in main > > > > >> > r = ctx.func(ctx) > > > > >> > File "/usr/sbin/cephadm", line 1678, in _infer_fsid > > > > >> > return func(ctx) > > > > >> > File "/usr/sbin/cephadm", line 1738, in _infer_image > > > > >> > return func(ctx) > > > > >> > File "/usr/sbin/cephadm", line 4514, in command_ceph_volume > > > > >> > out, err, code = call_throws(ctx, c.run_cmd(), > > > > verbosity=verbosity) > > > > >> > File "/usr/sbin/cephadm", line 1464, in call_throws > > > > >> > raise RuntimeError('Failed command: %s' % ' '.join(command)) > > > > >> > RuntimeError: Failed command: /usr/bin/docker run --rm > --ipc=host > > > > >> > --net=host --entrypoint /usr/sbin/ceph-volume --privileged > > > > >> --group-add=disk > > > > >> > --init -e CONTAINER_IMAGE=ceph/ceph@sha256 > > > :54e95ae1e11404157d7b329d0t > > > > >> > > > > > >> > root@drywood12:~# cephadm shell > > > > >> > Inferring fsid 1518c8e0-bbe4-11eb-9772-001e67dc85ea > > > > >> > Inferring config > > > > >> > > > > > > /var/lib/ceph/1518c8e0-bbe4-11eb-9772-001e67dc85ea/mon.drywood12/config > > > > >> > Using recent ceph image ceph/ceph@sha256 > > > > >> > > :54e95ae1e11404157d7b329d0bef866ebbb214b195a009e87aae4eba9d282949 > > > > >> > root@drywood12:/# ceph-volume lvm create --data /dev/sda > --dmcrypt > > > > >> > Running command: /usr/bin/ceph-authtool --gen-print-key > > > > >> > Running command: /usr/bin/ceph-authtool --gen-print-key > > > > >> > Running command: /usr/bin/ceph --cluster ceph --name > > > > client.bootstrap-osd > > > > >> > --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new > > > > >> > 70054a5c-c176-463a-a0ac-b44c5db0987c > > > > >> > stderr: 2021-05-25T07:46:18.188+0000 7fdef8f0d700 -1 auth: > unable > > > to > > > > >> find > > > > >> > a keyring on /var/lib/ceph/bootstrap-osd/ceph.keyring: (2) No > such > > > > file > > > > >> or > > > > >> > directory > > > > >> > stderr: 2021-05-25T07:46:18.188+0000 7fdef8f0d700 -1 > > > > >> > AuthRegistry(0x7fdef405b378) no keyring found at > > > > >> > /var/lib/ceph/bootstrap-osd/ceph.keyring, disabling cephx > > > > >> > stderr: 2021-05-25T07:46:18.188+0000 7fdef8f0d700 -1 auth: > unable > > > to > > > > >> find > > > > >> > a keyring on /var/lib/ceph/bootstrap-osd/ceph.keyring: (2) No > such > > > > file > > > > >> or > > > > >> > directory > > > > >> > stderr: 2021-05-25T07:46:18.188+0000 7fdef8f0d700 -1 > > > > >> > AuthRegistry(0x7fdef405ef20) no keyring found at > > > > >> > /var/lib/ceph/bootstrap-osd/ceph.keyring, disabling cephx > > > > >> > stderr: 2021-05-25T07:46:18.188+0000 7fdef8f0d700 -1 auth: > unable > > > to > > > > >> find > > > > >> > a keyring on /var/lib/ceph/bootstrap-osd/ceph.keyring: (2) No > such > > > > file > > > > >> or > > > > >> > directory > > > > >> > stderr: 2021-05-25T07:46:18.188+0000 7fdef8f0d700 -1 > > > > >> > AuthRegistry(0x7fdef8f0bea0) no keyring found at > > > > >> > /var/lib/ceph/bootstrap-osd/ceph.keyring, disabling cephx > > > > >> > stderr: 2021-05-25T07:46:18.188+0000 7fdef2d9d700 -1 > > > > monclient(hunting): > > > > >> > handle_auth_bad_method server allowed_methods [2] but i only > support > > > > [1] > > > > >> > stderr: 2021-05-25T07:46:18.188+0000 7fdef259c700 -1 > > > > monclient(hunting): > > > > >> > handle_auth_bad_method server allowed_methods [2] but i only > support > > > > [1] > > > > >> > stderr: 2021-05-25T07:46:18.188+0000 7fdef1d9b700 -1 > > > > monclient(hunting): > > > > >> > handle_auth_bad_method server allowed_methods [2] but i only > support > > > > [1] > > > > >> > stderr: 2021-05-25T07:46:18.188+0000 7fdef8f0d700 -1 monclient: > > > > >> > authenticate NOTE: no keyring found; disabled cephx > authentication > > > > >> > stderr: [errno 13] RADOS permission denied (error connecting > to the > > > > >> > cluster) > > > > >> > --> RuntimeError: Unable to create a new OSD id > > > > >> > root@drywood12:/# lsblk /dev/sda > > > > >> > NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT > > > > >> > sda 8:0 0 7.3T 0 disk > > > > >> > > > > > >> > As far as I can see cephadm gets a little further than this as > the > > > > disks > > > > >> > have lvm volumes on them just the osd's daemons are not created > or > > > > >> started. > > > > >> > So maybe I'm invoking ceph-volume incorrectly. > > > > >> > > > > > >> > > > > > >> > On Tue, 25 May 2021 at 06:57, Peter Childs <pchilds@xxxxxxx> > wrote: > > > > >> > > > > > >> >> > > > > >> >> > > > > >> >> On Mon, 24 May 2021, 21:08 Marc, <Marc@xxxxxxxxxxxxxxxxx> > wrote: > > > > >> >> > > > > >> >>> > > > > > >> >>> > I'm attempting to use cephadm and Pacific, currently on > debian > > > > >> buster, > > > > >> >>> > mostly because centos7 ain't supported any more and cenotos8 > > > ain't > > > > >> >>> > support > > > > >> >>> > by some of my hardware. > > > > >> >>> > > > > >> >>> Who says centos7 is not supported any more? Afaik centos7/el7 > is > > > > being > > > > >> >>> supported till its EOL 2024. By then maybe a good alternative > for > > > > >> >>> el8/stream has surfaced. > > > > >> >>> > > > > >> >> > > > > >> >> Not supported by ceph Pacific, it's our os of choice otherwise. > > > > >> >> > > > > >> >> My testing says the version available of podman, docker and > > > python3, > > > > do > > > > >> >> not work with Pacific. > > > > >> >> > > > > >> >> Given I've needed to upgrade docker on buster can we please > have a > > > > list > > > > >> of > > > > >> >> versions that work with cephadm, maybe even have cephadm say > no, > > > > please > > > > >> >> upgrade unless your running the right version or better. > > > > >> >> > > > > >> >> > > > > >> >> > > > > >> >>> > Anyway I have a few nodes with 59x 7.2TB disks but for some > > > reason > > > > >> the > > > > >> >>> > osd > > > > >> >>> > daemons don't start, the disks get formatted and the osd are > > > > created > > > > >> but > > > > >> >>> > the daemons never come up. > > > > >> >>> > > > > >> >>> what if you try with > > > > >> >>> ceph-volume lvm create --data /dev/sdi --dmcrypt ? > > > > >> >>> > > > > >> >> > > > > >> >> I'll have a go. > > > > >> >> > > > > >> >> > > > > >> >>> > They are probably the wrong spec for ceph (48gb of memory > and > > > > only 4 > > > > >> >>> > cores) > > > > >> >>> > > > > >> >>> You can always start with just configuring a few disks per > node. > > > > That > > > > >> >>> should always work. > > > > >> >>> > > > > >> >> > > > > >> >> That was my thought too. > > > > >> >> > > > > >> >> Thanks > > > > >> >> > > > > >> >> Peter > > > > >> >> > > > > >> >> > > > > >> >>> > but I was expecting them to start and be either dirt slow or > > > crash > > > > >> >>> > later, > > > > >> >>> > anyway I've got upto 30 of them, so I was hoping on getting > at > > > > least > > > > >> get > > > > >> >>> > 6PB of raw storage out of them. > > > > >> >>> > > > > > >> >>> > As yet I've not spotted any helpful error messages. > > > > >> >>> > > > > > >> >>> _______________________________________________ > > > > >> >>> ceph-users mailing list -- ceph-users@xxxxxxx > > > > >> >>> To unsubscribe send an email to ceph-users-leave@xxxxxxx > > > > >> >>> > > > > >> >> > > > > >> > _______________________________________________ > > > > >> > ceph-users mailing list -- ceph-users@xxxxxxx > > > > >> > To unsubscribe send an email to ceph-users-leave@xxxxxxx > > > > >> > > > > >> > > > > >> _______________________________________________ > > > > >> ceph-users mailing list -- ceph-users@xxxxxxx > > > > >> To unsubscribe send an email to ceph-users-leave@xxxxxxx > > > > >> > > > > > > > > > > > > _______________________________________________ > > > > ceph-users mailing list -- ceph-users@xxxxxxx > > > > To unsubscribe send an email to ceph-users-leave@xxxxxxx > > > > > > > _______________________________________________ > > > ceph-users mailing list -- ceph-users@xxxxxxx > > > To unsubscribe send an email to ceph-users-leave@xxxxxxx > > > > > _______________________________________________ > > ceph-users mailing list -- ceph-users@xxxxxxx > > To unsubscribe send an email to ceph-users-leave@xxxxxxx > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx