Re: Adding OSDs

Eugen Block <eblock@xxxxxx> · Thu, 11 Jun 2020 17:54:17 +0000

Since you’re using cephadm you should rather stick to ceph orch device  
ls (to see which devices are available) and then work with  
drive_groups. To clean up the disks you can remove all VGs and LVs  
that start with ceph. I would do this on the OSD nodes where you tried  
to create OSDs:

- ceph-volume lvm zap --destroy {vg-name}/{lv-name}
- if there’s something left with “ceph” in ‘vgs’ remove it
- remove PVs that were previously allocated for ceph
- check if your disks are listed as available in ‘ceph orch device ls’
- create yaml file for your drive_groups
- ceph orch apply -i drive_groups.yml

That should do it, at least from my experience.

Zitat von Will Payne <will@xxxxxxxxxxxxxxxx>:

On 11 Jun 2020, at 15:21, Eugen Block <eblock@xxxxxx> wrote:

Can you share which guide and deployment strategy you're following?  
I didn't have any issues deploying either completely manually [3]  
or with cephadm.

I followed the cephadm guide at  
https://ceph.readthedocs.io/en/latest/cephadm/install/  
<https://ceph.readthedocs.io/en/latest/cephadm/install/> as the docs  
said it was the recommended method.

The OSD nodes need a keyring to bootstrap OSDs:

Mmkay, done that. Now I get :

Running command: /usr/bin/ceph-authtool --gen-print-key
Running command: /usr/bin/ceph --cluster ceph --name  
client.bootstrap-osd --keyring  
/var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new  
a7e9874e-883d-458c-ae27-d828c6d262da
Running command: /sbin/lvcreate --yes -l 100%FREE -n  
osd-block-a7e9874e-883d-458c-ae27-d828c6d262da  
ceph-7df92c86-ad9b-4421-974e-381c61a7d9d1
 stderr: Volume group "ceph-7df92c86-ad9b-4421-974e-381c61a7d9d1" not found
  Cannot process volume group ceph-7df92c86-ad9b-4421-974e-381c61a7d9d1
--> Was unable to complete a new OSD, will rollback changes
Running command: /usr/bin/ceph --cluster ceph --name  
client.bootstrap-osd --keyring  
/var/lib/ceph/bootstrap-osd/ceph.keyring osd purge-new osd.0  
--yes-i-really-mean-it
 stderr: purged osd.0
-->  RuntimeError: command returned non-zero exit status: 5

.. now, this is probably my fault from too much tinkering. Where is  
this VG name coming from? Can I get back to a clean slate? (Or at  
least remove references to non-existing VGs?)

Will

---snip---
admin:~ # ceph auth get client.bootstrap-osd
exported keyring for client.bootstrap-osd
[client.bootstrap-osd]
       key = AQCtsWNdAAAAABAAV6g3yc7rSa0yvsfO1Xlj5w==
       caps mgr = "allow r"
       caps mon = "allow profile bootstrap-osd"

osd1:~ # cat  /var/lib/ceph/bootstrap-osd/ceph.keyring
[client.bootstrap-osd]
       key = AQCtsWNdAAAAABAAV6g3yc7rSa0yvsfO1Xlj5w==
       caps mgr = "allow r"
       caps mon = "allow profile bootstrap-osd"
---snip---

The manual deployment guide describes the procedure how to create  
OSDs from scratch. I would recommend to setup a cluster manually to  
get familiar with all the components that a deployment tool like  
cephadm will manage for you.

Again, can WAL and DB be pointed at the same SSD?

Yes, in that case you don't need to specify a WAL device, it will  
be automatically placed on the DB device.

[3]  
https://docs.ceph.com/docs/octopus/install/index_manual/#deploy-a-cluster-manually

Zitat von Will Payne <will@xxxxxxxxxxxxxxxx>:

If you want to specify vgname/lvname you have to create them  
manually and run:

ceph-volume lvm create --data /dev/sdc --block.db vgname/db-sdc  
--block.wal vgname/wal-sdc

Apt helpfully told me I need to install ceph-osd, which I did. The  
ceph-volume command then told me :

Running command: /usr/bin/ceph-authtool --gen-print-key
Running command: /usr/bin/ceph --cluster ceph --name  
client.bootstrap-osd --keyring  
/var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new ...
stderr: [errno 13] RADOS permission denied (error connecting to  
the cluster)
-->  RuntimeError: Unable to create a new OSD id

I note the keyring file doesn’t exist - is this something I need  
to create or should exist already? Or will it be created by this  
command?

ceph-volume lvm batch /dev/sdb ... /dev/sdj --db-devices /dev/sdk  
--wal-devices /dev/sdl
but then you don't get to choose vgname/lvname, the batch command  
creates the volume groups and logical volumes for you.

I’m not too fussed about the vg/lv names if they’re created for  
me. Can the db and Wal devices be the same device?

I would say the drive_groups approach is designed to automate  
things so you don't need to worry about creating VGs and LVs.

Again, can WAL and DB be pointed at the same SSD?

Sorry if I’m being stupid, the documentation just seems a little lacking.

Will

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx