On Thu, Oct 25, 2018 at 11:08 PM Noah Watkins <nwatkins@xxxxxxxxxx> wrote: > > After speaking with Alfredo and the orchestrator team, it seems there > are some open questions (well, maybe just questions whose answers need > to be written down) about OSD removal with ceph-volume. > > Feel free to expand the scope of this thread to the many different > destruction / deactivation scenarios, but we have been driven > initially by the conversion of one ceph-ansible playbook that removes > a specific OSD from the cluster that boils down to: > > 1. ceph-disk deactivate --deactivate-by-id ID --mark-out > 2. ceph-disk destroy --destroy-by-id ID --zap > 3. < manually destroy partitions from `ceph-disk list` > > > To accomplish the equivalent without ceph-disk we are doing the following: > > 1. ceph osd out ID > 2. systemctl disable ceph-osd@ID > 3. systemctl stop ceph-osd@ID > 4. something equivalent to: > | osd_devs = ceph-volume lvm list --format json > | for dev in osd_devs[ID]: > | ceph-volume lvm zap dev["path"] > 5. ceph osd purge ID > > This list seems to be complete after examining ceph docs and > ceph-volume itself. Is there anything missing? Similar questions here: > http://tracker.ceph.com/issues/22287 > > Of these steps, the primary question that has popped up is how to > maintain outside of ceph-volume, the inverse of the systemd unit > management that ceph-volume takes care of during OSD creation (e.g. > ceph-osd and ceph-volume units), and whether that inverse operation > should be a part of ceph-volume itself. My suggestion would be to have a separation of the three aspects of creating/destroying OSDs: A) The drive/volume manipulation part (ceph-volume) B) Enabling/disabling execution of the ceph-osd process (systemd, containers, something else...) C) The updates to Ceph cluster maps (ceph osd purge, ceph osd destroy etc) The thing that ties all three together would live up at the ceph-mgr layer, where a high level UI (the dashboard and new CLI bits) would tie it all together. That isn't to exclude having functionality in ceph-volume where it's a useful convenience (e.g. systemd), but in general ceph-volume can't be expected to know how to start OSD services in e.g. Kubernetes environments. John > My understanding of the systemd process for ceph is that the > ceph-volume unit itself activates the corresponding OSD using the > ceph-osd systemd template--so there isn't any osd-specific unit files > to clean up when an OSD is removed. That still leaves the question of > how to properly remove the ceph-volume units if that is indeed the > process that needs to occur. Glancing over the zap code, it doesn't > look like zap handles that task. Related tracker here: > http://tracker.ceph.com/issues/25029 > > In the ceph docs it seems to only indicate that the OSD needs to be > stopped, and presumably there are other final clean-up steps? > > > - Noah