On Fri, Oct 26, 2018 at 11:00 AM Jan Fajerski <jfajerski@xxxxxxxx> wrote: > > On Fri, Oct 26, 2018 at 08:06:34AM -0400, Alfredo Deza wrote: > >On Fri, Oct 26, 2018 at 7:11 AM John Spray <jspray@xxxxxxxxxx> wrote: > >> > >> On Thu, Oct 25, 2018 at 11:08 PM Noah Watkins <nwatkins@xxxxxxxxxx> wrote: > >> > > >> > After speaking with Alfredo and the orchestrator team, it seems there > >> > are some open questions (well, maybe just questions whose answers need > >> > to be written down) about OSD removal with ceph-volume. > >> > > >> > Feel free to expand the scope of this thread to the many different > >> > destruction / deactivation scenarios, but we have been driven > >> > initially by the conversion of one ceph-ansible playbook that removes > >> > a specific OSD from the cluster that boils down to: > >> > > >> > 1. ceph-disk deactivate --deactivate-by-id ID --mark-out > >> > 2. ceph-disk destroy --destroy-by-id ID --zap > >> > 3. < manually destroy partitions from `ceph-disk list` > > >> > > >> > To accomplish the equivalent without ceph-disk we are doing the following: > >> > > >> > 1. ceph osd out ID > >> > 2. systemctl disable ceph-osd@ID > >> > 3. systemctl stop ceph-osd@ID > >> > 4. something equivalent to: > >> > | osd_devs = ceph-volume lvm list --format json > >> > | for dev in osd_devs[ID]: > >> > | ceph-volume lvm zap dev["path"] > >> > 5. ceph osd purge ID > >> > > >> > This list seems to be complete after examining ceph docs and > >> > ceph-volume itself. Is there anything missing? Similar questions here: > >> > http://tracker.ceph.com/issues/22287 > >> > > >> > Of these steps, the primary question that has popped up is how to > >> > maintain outside of ceph-volume, the inverse of the systemd unit > >> > management that ceph-volume takes care of during OSD creation (e.g. > >> > ceph-osd and ceph-volume units), and whether that inverse operation > >> > should be a part of ceph-volume itself. > >> > >> My suggestion would be to have a separation of the three aspects of > >> creating/destroying OSDs: > >> A) The drive/volume manipulation part (ceph-volume) > >> B) Enabling/disabling execution of the ceph-osd process (systemd, > >> containers, something else...) > >> C) The updates to Ceph cluster maps (ceph osd purge, ceph osd destroy etc) > >> > >> The thing that ties all three together would live up at the ceph-mgr > >> layer, where a high level UI (the dashboard and new CLI bits) would > >> tie it all together. > > > >This proposed separation is at odds to what ceph-volume does today. > >All three happen when provisioning an OSD. Not having some counterpart > >for deactivation would cause similar confusion as today: why enabling > >happens in ceph-volume while disabling/deactivation is not there? > > > >> > >> That isn't to exclude having functionality in ceph-volume where it's a > >> useful convenience (e.g. systemd), but in general ceph-volume can't be > >> expected to know how to start OSD services in e.g. Kubernetes > >> environments. > > > >The same could be said when provisioning. How does ceph-volume knows > >how to provision an OSD in kubernetes? It doesn't. What we do there > >is enable certain functionality that containers can make use of, for > >example do all the activation but skip the systemd enabling. > > > >There are a couple of reasons why 'deactivate' hasn't made it into > >ceph-volume. One of them is that, it wasn't clear (to me) if > >deactivation meant full removal/purging of the OSD or if > >it meant to leave it in a state where it wouldn't start (e.g. > >disabling the systemd units). > > > >My guess is that there is a need for both and for a few more use > >cases, like disabling the systemd unit so that the same OSD can be > >provisioned. So far we've concentrated in the creation of OSDs > >surpassing ceph-disk > >features, but I think that we can start exploring the complexity of > >deactivation now. > Yeah that would be great. I was wondering about lvm management that might relate > to this. Afaiu (and please correct me if I'm wrong) c-v does some basic lvm > management when a block device is passed as --data but to get an lv as wal/db > devices it must be created beforehand. > Would it make sense to add a dedicated lvm management layer to c-v or was this > ruled out long ago? We have! It is now part of the `ceph-volume lvm batch` sub-command which will create everything for you given an input of devices. http://docs.ceph.com/docs/master/ceph-volume/lvm/batch/ > I think this could also have benefits for other operation regarding lv's, like > renaming and growing an lv (I believe Igor was looking into growing a wal/db lv > and then growing BlueFS after that). > > Best, > Jan > > > > > >> > >> John > >> > >> > My understanding of the systemd process for ceph is that the > >> > ceph-volume unit itself activates the corresponding OSD using the > >> > ceph-osd systemd template--so there isn't any osd-specific unit files > >> > to clean up when an OSD is removed. That still leaves the question of > >> > how to properly remove the ceph-volume units if that is indeed the > >> > process that needs to occur. Glancing over the zap code, it doesn't > >> > look like zap handles that task. Related tracker here: > >> > http://tracker.ceph.com/issues/25029 > >> > > >> > In the ceph docs it seems to only indicate that the OSD needs to be > >> > stopped, and presumably there are other final clean-up steps? > >> > > >> > > >> > - Noah > > > > -- > Jan Fajerski > Engineer Enterprise Storage > SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, > HRB 21284 (AG Nürnberg)