On Sat, Feb 24, 2018 at 1:26 PM, Oliver Freyermuth <freyermuth@xxxxxxxxxxxxxxxxxx> wrote: > Dear Cephalopodians, > > when purging a single OSD on a host (created via ceph-deploy 2.0, i.e. using ceph-volume lvm), I currently proceed as follows: > > On the OSD-host: > $ systemctl stop ceph-osd@4.service > $ ls -la /var/lib/ceph/osd/ceph-4 > # Check block und block.db links: > lrwxrwxrwx. 1 ceph ceph 93 23. Feb 01:28 block -> /dev/ceph-69b1fbe5-f084-4410-a99a-ab57417e7846/osd-block-cd273506-e805-40ac-b23d-c7b9ff45d874 > lrwxrwxrwx. 1 root root 43 23. Feb 01:28 block.db -> /dev/ceph-osd-blockdb-ssd-1/db-for-disk-sda > # resolve actual underlying device: > $ pvs | grep ceph-69b1fbe5-f084-4410-a99a-ab57417e7846 > /dev/sda ceph-69b1fbe5-f084-4410-a99a-ab57417e7846 lvm2 a-- <3,64t 0 > # Zap the device: > $ ceph-volume lvm zap --destroy /dev/sda > > Now, on the mon: > # purge the OSD: > $ ceph osd purge osd.4 --yes-i-really-mean-it > > Then I re-deploy using: > $ ceph-deploy --overwrite-conf osd create --bluestore --block-db ceph-osd-blockdb-ssd-1/db-for-disk-sda --data /dev/sda osd001 > > from the admin-machine. > > This works just fine, however, it leaves a stray ceph-volume service behind: > $ ls -la /etc/systemd/system/multi-user.target.wants/ -1 | grep ceph-volume@lvm-4 > lrwxrwxrwx. 1 root root 44 24. Feb 18:30 ceph-volume@lvm-4-5a984083-48e1-4c2f-a1f3-3458c941e597.service -> /usr/lib/systemd/system/ceph-volume@.service > lrwxrwxrwx. 1 root root 44 23. Feb 01:28 ceph-volume@lvm-4-cd273506-e805-40ac-b23d-c7b9ff45d874.service -> /usr/lib/systemd/system/ceph-volume@.service > > This stray service then, after reboot of the machine, stays in activating state (since the disk will of course never come back): > ----------------------------------- > $ systemctl status ceph-volume@lvm-4-cd273506-e805-40ac-b23d-c7b9ff45d874.service > ● ceph-volume@lvm-4-cd273506-e805-40ac-b23d-c7b9ff45d874.service - Ceph Volume activation: lvm-4-cd273506-e805-40ac-b23d-c7b9ff45d874 > Loaded: loaded (/usr/lib/systemd/system/ceph-volume@.service; enabled; vendor preset: disabled) > Active: activating (start) since Sa 2018-02-24 19:21:47 CET; 1min 12s ago > Main PID: 1866 (timeout) > CGroup: /system.slice/system-ceph\x2dvolume.slice/ceph-volume@lvm-4-cd273506-e805-40ac-b23d-c7b9ff45d874.service > ├─1866 timeout 10000 /usr/sbin/ceph-volume-systemd lvm-4-cd273506-e805-40ac-b23d-c7b9ff45d874 > └─1872 /usr/bin/python2.7 /usr/sbin/ceph-volume-systemd lvm-4-cd273506-e805-40ac-b23d-c7b9ff45d874 > > Feb 24 19:21:47 osd001.baf.physik.uni-bonn.de systemd[1]: Starting Ceph Volume activation: lvm-4-cd273506-e805-40ac-b23d-c7b9ff45d874... > ----------------------------------- > Manually, I can fix this by running: > $ systemctl disable ceph-volume@lvm-4-cd273506-e805-40ac-b23d-c7b9ff45d874.service > > My question is: Should I really remove that manually? > Should "ceph-volume lvm zap --destroy" have taken care of it (bug)? You should remove it manually. The problem with zapping is that we might not have the information we need to remove the systemd unit. Since an OSD can be made out of different devices, ceph-volume might be asked to "zap" a device which it can't compute to what OSD it belongs. The systemd units are tied to the ID and UUID of the OSD. > Am I missing a step? > > Cheers, > Oliver > > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com