On Wed, Sep 11, 2019 at 11:17:47AM +0100, Matthew Vernon wrote: >Hi, > >We keep finding part-made OSDs (they appear not attached to any host, >and down and out; but still counting towards the number of OSDs); we >never saw this with ceph-disk. On investigation, this is because >ceph-volume lvm create makes the OSD (ID and auth at least) too early in >the process and is then unable to roll-back cleanly (because the >bootstrap-osd credential isn't allowed to remove OSDs). > >As an example (very truncated): > >Running command: /usr/bin/ceph --cluster ceph --name >client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring >-i - osd new 20cea174-4c1b-4330-ad33-505a03156c33 >Running command: vgcreate --force --yes >ceph-9d66ec60-c71b-49e0-8c1a-e74e98eafb0e /dev/sdbh > stderr: Device /dev/sdbh not found (or ignored by filtering). > Unable to add physical volume '/dev/sdbh' to volume group >'ceph-9d66ec60-c71b-49e0-8c1a-e74e98eafb0e'. >--> Was unable to complete a new OSD, will rollback changes >--> OSD will be fully purged from the cluster, because the ID was generated >Running command: ceph osd purge osd.828 --yes-i-really-mean-it > stderr: 2019-09-10 15:07:53.396528 7fbca2caf700 -1 auth: unable to find >a keyring on >/etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,: >(2) No such file or directory > stderr: 2019-09-10 15:07:53.397318 7fbca2caf700 -1 monclient: >authenticate NOTE: no keyring found; disabled cephx authentication >2019-09-10 15:07:53.397334 7fbca2caf700 0 librados: client.admin >authentication error (95) Operation not supported > >This is annoying to have to clear up, and it seems to me could be >avoided by either: > >i) ceph-volume should (attempt to) set up the LVM volumes &c before >making the new OSD id >or >ii) allow the bootstrap-osd credential to purge OSDs > >i) seems like clearly the better answer...? Agreed. Would you mind opening a bug report on https://tracker.ceph.com/projects/ceph-volume. I have found other situation where a roll-back is working as it should, though not with as much impact as this. > >Regards, > >Matthew > >_______________________________________________ >ceph-users mailing list >ceph-users@xxxxxxxxxxxxxx >http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Jan Fajerski Senior Software Engineer Enterprise Storage SUSE Software Solutions Germany GmbH (HRB 247165, AG München) Geschäftsführer: Felix Imendörffer _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx