Re: ceph-volume simple disk scenario without LVM for OSD on PVC

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Dec 6, 2019 at 8:31 AM Sage Weil <sage@xxxxxxxxxxxx> wrote:
>
> My thoughts here are still pretty inconclusive...
>
> I agree that we should invest a non-LVM mode, but there isn't a way to do
> that currently that supports dm-crypt that isn't complicated and
> convoluted, so it cannot be a full replacement for the LVM mode.

The `ceph-volume simple` sub-command does allow dmcrypt. The key is
stored in the JSON file in /etc/ceph/osd.

Is there a scenario you've seen where this is not possible? The
`simple` sub-command would even allow partitions (regardless of
ceph-disk).

>
> At the same time, Real Soon Now we're going to be building crimson OSDs
> backed by ZNS SSDs (and eventually persistent memory), which will also
> very clearly not be LVM-based.  I'm a bit hesitant to introduce a
> bare-bones bluestore mode right now just because we'll be adding yet
> another variation soon, and it may be that we construct a general approach
> to both... but probably not.  And the whole point of c-v's architecture
> was to be pluggable.
>
> So maybe a bare-bones bluestore mode makes sense.  In the simple case, it
> really should be *very* simple.  But its scope pretty quickly expodes:
> what about wal and db devices?  We have labels for those, so we could
> support those, also easily... if the user has to partition the devices
> beforehand manually.  They'll immediately want to use the new
> auto/batch thing, but that's tied to the LVM implementation.  And what
> if one of the db/wal/main devices is an LV and another is not?  We'd
> need to make sure the lvm mode machinery doesn't trigger unless all of
> its labels are there, but it might be confusing.  All of which means that
> this is probably *only* useful for single-device OSDs.  On the one hand,
> those are increasingly common (hello, all-SSD clusters), but on the other
> hand, for fast SSDs we may want to deploy N of them per device.
>
> Since we can't cover all of that, and at a minimum, we can't cover
> dm-crypt, Rook will need to behave with the lvm mode one way or another.
> So we need to have a wrapper (or something similar) no matter what.  So I
> suggest we start there.
>
> sage
>
>
> On Fri, 6 Dec 2019, Sebastien Han wrote:
>
> > Hi Kai,
> >
> > Thanks!
> > –––––––––
> > Sébastien Han
> > Senior Principal Software Engineer, Storage Architect
> >
> > "Always give 100%. Unless you're giving blood."
> >
> > On Fri, Dec 6, 2019 at 10:44 AM Kai Wagner <kwagner@xxxxxxxx> wrote:
> > >
> > > Hi Sebastien and thanks for your feedback.
> > >
> > > On 06.12.19 10:00, Sebastien Han wrote:
> > > > ceph-volume is a sunk cost!
> > > > And your argument basically falls into that paradigm, "oh we have
> > > > invested so much already, that we cannot stop and we should continue
> > > > even though this will only bring more trouble". Incapable of accepting
> > > > this sunk cost.
> > > > All the issues that have been fixed with a lot of pain.
> > > > All that pain could have been avoided if LVM wasn't there and pursuing
> > > > in that direction will only lead us to more pain again.
> > >
> > > The reason I disagree here is the scenario were the WAL/DB is on a
> > > separate device and a single OSD crashes. In that case you would like to
> > > recreate just that single OSD instead of the whole group. Also if we
> > > deprecate a tool such like we did with ceph-disk, users have to migrate
> > > sooner or later if they don't want to do everything manually on the CLI
> > > (by that I mean via fdisk/pure lvm commands and so on).
> > >
> > > We could argue now that this can still be done on the command line
> > > manually but all our efforts are towards simplicity/automation and
> > > having everything in the Dashboard. If the underlying tool/functionality
> > > isn't there anymore, that isn't possible.
> > >
> >
> > I understand your position, yes when we start separating block/db/wal
> > things get really complex that's why I'm sticking with block/db/wal in
> > the same block.
> > Also, we haven't seen any request for separating those when running
> > OSDs on PVC in the Cloud. So we would likely continue to do so for a
> > while.
> >
> > > > Also, I'm not saying we should replace the tool but allow not using
> > > > LVM for a simple scenario to start with
> > >
> > > Which then leads me to, why couldn't such functionality be implemented
> > > into a single tool instead of having two at the end?
> > >
> > > So don't get me wrong, I'm not saying that I'm against everything I'm
> > > just saying that I think this is a topic that should be discussed in
> > > more depth.
> >
> > Yes, that's for sure.
> >
> > >
> > > As said, just my two cents here.
> > >
> > > Kai
> > >
> > > --
> > > SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, D 90409 Nürnberg
> > > GF:Geschäftsführer: Felix Imendörffer, (HRB 36809, AG Nürnberg)
> > >
> > >
> > _______________________________________________
> > Dev mailing list -- dev@xxxxxxx
> > To unsubscribe send an email to dev-leave@xxxxxxx
> > _______________________________________________
> Dev mailing list -- dev@xxxxxxx
> To unsubscribe send an email to dev-leave@xxxxxxx
_______________________________________________
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx




[Index of Archives]     [CEPH Users]     [Ceph Devel]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux