Hi, We keep finding part-made OSDs (they appear not attached to any host, and down and out; but still counting towards the number of OSDs); we never saw this with ceph-disk. On investigation, this is because ceph-volume lvm create makes the OSD (ID and auth at least) too early in the process and is then unable to roll-back cleanly (because the bootstrap-osd credential isn't allowed to remove OSDs). As an example (very truncated): Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new 20cea174-4c1b-4330-ad33-505a03156c33 Running command: vgcreate --force --yes ceph-9d66ec60-c71b-49e0-8c1a-e74e98eafb0e /dev/sdbh stderr: Device /dev/sdbh not found (or ignored by filtering). Unable to add physical volume '/dev/sdbh' to volume group 'ceph-9d66ec60-c71b-49e0-8c1a-e74e98eafb0e'. --> Was unable to complete a new OSD, will rollback changes --> OSD will be fully purged from the cluster, because the ID was generated Running command: ceph osd purge osd.828 --yes-i-really-mean-it stderr: 2019-09-10 15:07:53.396528 7fbca2caf700 -1 auth: unable to find a keyring on /etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,: (2) No such file or directory stderr: 2019-09-10 15:07:53.397318 7fbca2caf700 -1 monclient: authenticate NOTE: no keyring found; disabled cephx authentication 2019-09-10 15:07:53.397334 7fbca2caf700 0 librados: client.admin authentication error (95) Operation not supported This is annoying to have to clear up, and it seems to me could be avoided by either: i) ceph-volume should (attempt to) set up the LVM volumes &c before making the new OSD id or ii) allow the bootstrap-osd credential to purge OSDs i) seems like clearly the better answer...? Regards, Matthew
Attachment:
signature.asc
Description: OpenPGP digital signature
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com