Re: How to correctly purge a "ceph-volume lvm" OSD

Alfredo Deza <adeza@xxxxxxxxxx> · Mon, 26 Feb 2018 12:10:12 -0500

On Mon, Feb 26, 2018 at 11:24 AM, David Turner <drakonstein@xxxxxxxxx> wrote:
> If we're asking for documentation updates, the man page for ceph-volume is
> incredibly outdated.  In 12.2.3 it still says that bluestore is not yet
> implemented and that it's planned to be supported.
> '[--bluestore] filestore objectstore (not yet implemented)'
> 'using  a  filestore  setup (bluestore  support  is  planned)'.

This is a bit hard to track because ceph-deploy is an out-of-tree
project that gets pulled into the Ceph repo, and the man page lives in
the Ceph source tree.

We have updated the man page and the references to ceph-deploy to
correctly show the new API and all the flags supported, but this is in
master and was not backported
to luminous.

>
> On Mon, Feb 26, 2018 at 7:05 AM Oliver Freyermuth
> <freyermuth@xxxxxxxxxxxxxxxxxx> wrote:
>>
>> Am 26.02.2018 um 13:02 schrieb Alfredo Deza:
>> > On Sat, Feb 24, 2018 at 1:26 PM, Oliver Freyermuth
>> > <freyermuth@xxxxxxxxxxxxxxxxxx> wrote:
>> >> Dear Cephalopodians,
>> >>
>> >> when purging a single OSD on a host (created via ceph-deploy 2.0, i.e.
>> >> using ceph-volume lvm), I currently proceed as follows:
>> >>
>> >> On the OSD-host:
>> >> $ systemctl stop ceph-osd@4.service
>> >> $ ls -la /var/lib/ceph/osd/ceph-4
>> >> # Check block und block.db links:
>> >> lrwxrwxrwx.  1 ceph ceph   93 23. Feb 01:28 block ->
>> >> /dev/ceph-69b1fbe5-f084-4410-a99a-ab57417e7846/osd-block-cd273506-e805-40ac-b23d-c7b9ff45d874
>> >> lrwxrwxrwx.  1 root root   43 23. Feb 01:28 block.db ->
>> >> /dev/ceph-osd-blockdb-ssd-1/db-for-disk-sda
>> >> # resolve actual underlying device:
>> >> $ pvs | grep ceph-69b1fbe5-f084-4410-a99a-ab57417e7846
>> >>   /dev/sda   ceph-69b1fbe5-f084-4410-a99a-ab57417e7846 lvm2 a--
>> >> <3,64t     0
>> >> # Zap the device:
>> >> $ ceph-volume lvm zap --destroy /dev/sda
>> >>
>> >> Now, on the mon:
>> >> # purge the OSD:
>> >> $ ceph osd purge osd.4 --yes-i-really-mean-it
>> >>
>> >> Then I re-deploy using:
>> >> $ ceph-deploy --overwrite-conf osd create --bluestore --block-db
>> >> ceph-osd-blockdb-ssd-1/db-for-disk-sda --data /dev/sda osd001
>> >>
>> >> from the admin-machine.
>> >>
>> >> This works just fine, however, it leaves a stray ceph-volume service
>> >> behind:
>> >> $ ls -la /etc/systemd/system/multi-user.target.wants/ -1 | grep
>> >> ceph-volume@lvm-4
>> >> lrwxrwxrwx.  1 root root   44 24. Feb 18:30
>> >> ceph-volume@lvm-4-5a984083-48e1-4c2f-a1f3-3458c941e597.service ->
>> >> /usr/lib/systemd/system/ceph-volume@.service
>> >> lrwxrwxrwx.  1 root root   44 23. Feb 01:28
>> >> ceph-volume@lvm-4-cd273506-e805-40ac-b23d-c7b9ff45d874.service ->
>> >> /usr/lib/systemd/system/ceph-volume@.service
>> >>
>> >> This stray service then, after reboot of the machine, stays in
>> >> activating state (since the disk will of course never come back):
>> >> -----------------------------------
>> >> $ systemctl status
>> >> ceph-volume@lvm-4-cd273506-e805-40ac-b23d-c7b9ff45d874.service
>> >> ● ceph-volume@lvm-4-cd273506-e805-40ac-b23d-c7b9ff45d874.service - Ceph
>> >> Volume activation: lvm-4-cd273506-e805-40ac-b23d-c7b9ff45d874
>> >>    Loaded: loaded (/usr/lib/systemd/system/ceph-volume@.service;
>> >> enabled; vendor preset: disabled)
>> >>    Active: activating (start) since Sa 2018-02-24 19:21:47 CET; 1min
>> >> 12s ago
>> >>  Main PID: 1866 (timeout)
>> >>    CGroup:
>> >> /system.slice/system-ceph\x2dvolume.slice/ceph-volume@lvm-4-cd273506-e805-40ac-b23d-c7b9ff45d874.service
>> >>            ├─1866 timeout 10000 /usr/sbin/ceph-volume-systemd
>> >> lvm-4-cd273506-e805-40ac-b23d-c7b9ff45d874
>> >>            └─1872 /usr/bin/python2.7 /usr/sbin/ceph-volume-systemd
>> >> lvm-4-cd273506-e805-40ac-b23d-c7b9ff45d874
>> >>
>> >> Feb 24 19:21:47 osd001.baf.physik.uni-bonn.de systemd[1]: Starting Ceph
>> >> Volume activation: lvm-4-cd273506-e805-40ac-b23d-c7b9ff45d874...
>> >> -----------------------------------
>> >> Manually, I can fix this by running:
>> >> $ systemctl disable
>> >> ceph-volume@lvm-4-cd273506-e805-40ac-b23d-c7b9ff45d874.service
>> >>
>> >> My question is: Should I really remove that manually?
>> >> Should "ceph-volume lvm zap --destroy" have taken care of it (bug)?
>> >
>> > You should remove it manually. The problem with zapping is that we
>> > might not have the information we need to remove the systemd unit.
>> > Since an OSD can be made out of different devices, ceph-volume might
>> > be asked to "zap" a device which it can't compute to what OSD it
>> > belongs. The systemd units are tied to the ID and UUID of the OSD.
>>
>> Understood, thanks for the reply!
>>
>> Could this be added to the documentation at some point for all the other
>> users operating the cluster manually / with ceph-deploy?
>> This would likely be best to prevent others from falling into this trap
>> ;-).
>> Should I open a ticket asking for this?
>>
>> Cheers,
>>         Oliver
>>
>> >
>> >
>> >> Am I missing a step?
>> >>
>> >> Cheers,
>> >>         Oliver
>> >>
>> >>
>> >> _______________________________________________
>> >> ceph-users mailing list
>> >> ceph-users@xxxxxxxxxxxxxx
>> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >>
>>
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com