Re: [RFE] ceph-volume prepare and activate enhancements for containers

Sebastien Han <shan@xxxxxxxxxx> · Fri, 6 Dec 2019 14:30:48 +0100

I understand this is asking a lot from the ceph-volume side.
We can explore a new wrapper binary or perhaps from the ceph-osd itself.

Maybe crazy/stupid idea, can we have a de-activate call from the osd
process itself? ceph-osd gets SIGTERM, closes the connection to the
device, then runs "vgchange -an <vg>", is this realistic?

Thanks!
–––––––––
Sébastien Han
Senior Principal Software Engineer, Storage Architect

"Always give 100%. Unless you're giving blood."

On Fri, Dec 6, 2019 at 1:44 PM Alfredo Deza <adeza@xxxxxxxxxx> wrote:
>
> On Fri, Dec 6, 2019 at 5:59 AM Sebastien Han <shan@xxxxxxxxxx> wrote:
> >
> > Hi,
> >
> > Following up on my previous ceph-volume email as promised.
> >
> > When running Ceph with Rook in Kubernetes in the Cloud (Aws, Azure,
> > Google, whatever), the OSDs are backed by PVC (Cloud block storage)
> > attached to virtual machines.
> > This makes the storage portable if the VM dies, the device will be
> > attached to a new virtual machine and the OSD will resume running.
> >
> > In Rook, we have 2 main deployments for the OSD:
> >
> > 1. Prepare the disk to become an OSD
> > Prepare will run on the VM, attach the block device, run "ceph-volume
> > prepare", then this gets complicated. After this, the device is
> > supposed to be detached from the VM because the container terminated.
> > However, the block is still held by LVM so the VG must be
> > de-activated. Currently, we do this in Rook, but it would be nice to
> > de-activate the VG once ceph-volume is done preparing the disk in a
> > container.
> >
> > 2. Activate the OSD.
> > Now, onto the new container, the device is attached again on the VM.
> > At this point, more changes will be required in ceph-volume,
> > particularly in the "activate" call.
> >   a. ceph-volume should activate the VG
>
> By VG you mean LVM's Volume Group?
>
> >   b. ceph-volume should activate the device normally
>
> Not "normally" though right? That would imply starting the OSD which
> you are indicating is not desired.
>
> >   c. ceph-volume should run the ceph-osd process in foreground as well
> > as accepting flag to that CLI, we could have something like:
> > "ceph-volume lvm activate --no-systemd $STORE_FALG $OSD_ID $OSD_UUID
> > <a bunch of flags>"
> >   Perhaps we need a new flag to indicate we want to run the osd
> > process in foreground?
> >   Here is an example on how an OSD run today:
> >
> >   ceph-osd --foreground --id 2 --fsid
> > 9a531951-50f2-4d48-b012-0aef0febc301 --setuser ceph --setgroup ceph
> > --crush-location=root=default host=minikube --default-log-to-file
> > false --ms-learn-addr-from-peer=false
> >
> >   --> we can have a bunch of flags or an ENV var with all the flags
> > whatever you prefer.
> >
> >   This wrapper should watch for signals too, it should reply to
> > SIGTERM in the following way:
> >     - stop the OSD
> >     - de-activate the VG
> >     - exit 0
> >
> > Just a side note, the VG must be de-activated when the container stops
> > so that the block device can be detached from the VMs, otherwise,
> > it'll still be held by LVM.
>
> I am worried that this goes beyond what I consider the scope of
> ceph-volume which is: prepare device(s) to be part of an OSD.
>
> Catching signals, handling the OSD in the foreground, and accepting
> (proxying) flags, sounds problematic for a robust implementation in
> ceph-volume, even
> if that means it will help Rook in this case.
>
> The other challenge I see is that it seems Ceph is in a transition
> from being a baremetal project to a container one, except lots of
> tooling (like ceph-volume) is deeply
> tied to the non-containerized workflows. This makes it difficult (and
> non-obvious!) in ceph-volume when adding more flags to do things that
> help the containerized
> deployment.
>
> To solve the issues you describe, I think you need either a separate
> command-line tool that can invoke ceph-volume with the added features
> you listed, or
> if there is significant push to get more things in ceph-volume, a
> separate sub-command, so that the `lvm` is isolated from the
> conflicting logic.
>
> My preference would be a wrapper script, separate from the Ceph project.
>
> >
> > Hopefully, I was clear :).
> > This is just a proposal if you feel like this could be done
> > differently, feel free to suggest.
> >
> > Thanks!
> > –––––––––
> > Sébastien Han
> > Senior Principal Software Engineer, Storage Architect
> >
> > "Always give 100%. Unless you're giving blood."
> > _______________________________________________
> > Dev mailing list -- dev@xxxxxxx
> > To unsubscribe send an email to dev-leave@xxxxxxx
>
_______________________________________________
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx