killing ceph-disk [was Re: ceph-volume: migration and disk partition support]

Sage Weil <sage@xxxxxxxxxxxx> · Mon, 9 Oct 2017 15:09:29 +0000 (UTC)

To put this in context, the goal here is to kill ceph-disk in mimic.  

One proposal is to make it so new OSDs can *only* be deployed with LVM, 
and old OSDs with the ceph-disk GPT partitions would be started via 
ceph-volume support that can only start (but not deploy new) OSDs in that 
style.

Is the LVM-only-ness concerning to anyone?

Looking further forward, NVMe OSDs will probably be handled a bit 
differently, as they'll eventually be using SPDK and kernel-bypass (hence, 
no LVM).  For the time being, though, they would use LVM.

On Fri, 6 Oct 2017, Alfredo Deza wrote:
> Now that ceph-volume is part of the Luminous release, we've been able
> to provide filestore support for LVM-based OSDs. We are making use of
> LVM's powerful mechanisms to store metadata which allows the process
> to no longer rely on UDEV and GPT labels (unlike ceph-disk).
> 
> Bluestore support should be the next step for `ceph-volume lvm`, and
> while that is planned we are thinking of ways to improve the current
> caveats (like OSDs not coming up) for clusters that have deployed OSDs
> with ceph-disk.
> 
> --- New clusters ---
> The `ceph-volume lvm` deployment is straightforward (currently
> supported in ceph-ansible), but there isn't support for plain disks
> (with partitions) currently, like there is with ceph-disk.
> 
> Is there a pressing interest in supporting plain disks with
> partitions? Or only supporting LVM-based OSDs fine?

Perhaps the "out" here is to support a "dir" option where the user can 
manually provision and mount an OSD on /var/lib/ceph/osd/*, with 'journal' 
or 'block' symlinks, and ceph-volume will do the last bits that initialize 
the filestore or bluestore OSD from there.  Then if someone has a scenario 
that isn't captured by LVM (or whatever else we support) they can always 
do it manually?

> --- Existing clusters ---
> Migration to ceph-volume, even with plain disk support means
> re-creating the OSD from scratch, which would end up moving data.
> There is no way to make a GPT/ceph-disk OSD become a ceph-volume one
> without starting from scratch.
> 
> A temporary workaround would be to provide a way for existing OSDs to
> be brought up without UDEV and ceph-disk, by creating logic in
> ceph-volume that could load them with systemd directly. This wouldn't
> make them lvm-based, nor it would mean there is direct support for
> them, just a temporary workaround to make them start without UDEV and
> ceph-disk.
> 
> I'm interested in what current users might look for here,: is it fine
> to provide this workaround if the issues are that problematic? Or is
> it OK to plan a migration towards ceph-volume OSDs?

IMO we can't require any kind of data migration in order to upgrade, which 
means we either have to (1) keep ceph-disk around indefinitely, or (2) 
teach ceph-volume to start existing GPT-style OSDs.  Given all of the 
flakiness around udev, I'm partial to #2.  The big question for me is 
whether #2 alone is sufficient, or whether ceph-volume should also know 
how to provision new OSDs using partitions and no LVM.  Hopefully not?

sage
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html