Re: ceph-volume simple disk scenario without LVM for OSD on PVC

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Dec 6, 2019 at 9:39 AM Sage Weil <sage@xxxxxxxxxxxx> wrote:
>
> On Fri, 6 Dec 2019, Alfredo Deza wrote:
> > On Fri, Dec 6, 2019 at 8:31 AM Sage Weil <sage@xxxxxxxxxxxx> wrote:
> > >
> > > My thoughts here are still pretty inconclusive...
> > >
> > > I agree that we should invest a non-LVM mode, but there isn't a way to do
> > > that currently that supports dm-crypt that isn't complicated and
> > > convoluted, so it cannot be a full replacement for the LVM mode.
> >
> > The `ceph-volume simple` sub-command does allow dmcrypt. The key is
> > stored in the JSON file in /etc/ceph/osd.
> >
> > Is there a scenario you've seen where this is not possible? The
> > `simple` sub-command would even allow partitions (regardless of
> > ceph-disk).
>
> For the dm-crypt case, I'm assuming we need the key to be attached to the
> device in some way.

We don't need it attached to the device. It just happens that
ceph-disk has a partition where it would store it
and ceph-volume is able to "scan" it and retrieve it so that it can
save it in the JSON file.

> LVM does this with another LV (IIRC); ceph-disk did
> it with a tiny partition.  Putting it in /etc/ceph means you can't move a
> disk to another server without manually copying files arounds.

You are right about LVM and ceph-disk. What I'm trying to make clear
is: it is entirely possible, and supported,
to have an encrypted OSD via the `simple` sub-command in the JSON file.

If you need to move disks around this will not work (would love not to
even support this at all, as it is an optimization for small clusters)

If the OS dies with the keys, then yes, you would need to have those
files replaced somehow. In the case of containers, the files are
already
somewhere else (the host), and in the specific case of Rook, I want to
reiterate: it is possible, and supported to have encrypted OSDs with
the key in the JSON file.

>
> sage
>
>  >
> > >
> > > At the same time, Real Soon Now we're going to be building crimson OSDs
> > > backed by ZNS SSDs (and eventually persistent memory), which will also
> > > very clearly not be LVM-based.  I'm a bit hesitant to introduce a
> > > bare-bones bluestore mode right now just because we'll be adding yet
> > > another variation soon, and it may be that we construct a general approach
> > > to both... but probably not.  And the whole point of c-v's architecture
> > > was to be pluggable.
> > >
> > > So maybe a bare-bones bluestore mode makes sense.  In the simple case, it
> > > really should be *very* simple.  But its scope pretty quickly expodes:
> > > what about wal and db devices?  We have labels for those, so we could
> > > support those, also easily... if the user has to partition the devices
> > > beforehand manually.  They'll immediately want to use the new
> > > auto/batch thing, but that's tied to the LVM implementation.  And what
> > > if one of the db/wal/main devices is an LV and another is not?  We'd
> > > need to make sure the lvm mode machinery doesn't trigger unless all of
> > > its labels are there, but it might be confusing.  All of which means that
> > > this is probably *only* useful for single-device OSDs.  On the one hand,
> > > those are increasingly common (hello, all-SSD clusters), but on the other
> > > hand, for fast SSDs we may want to deploy N of them per device.
> > >
> > > Since we can't cover all of that, and at a minimum, we can't cover
> > > dm-crypt, Rook will need to behave with the lvm mode one way or another.
> > > So we need to have a wrapper (or something similar) no matter what.  So I
> > > suggest we start there.
> > >
> > > sage
> > >
> > >
> > > On Fri, 6 Dec 2019, Sebastien Han wrote:
> > >
> > > > Hi Kai,
> > > >
> > > > Thanks!
> > > > –––––––––
> > > > Sébastien Han
> > > > Senior Principal Software Engineer, Storage Architect
> > > >
> > > > "Always give 100%. Unless you're giving blood."
> > > >
> > > > On Fri, Dec 6, 2019 at 10:44 AM Kai Wagner <kwagner@xxxxxxxx> wrote:
> > > > >
> > > > > Hi Sebastien and thanks for your feedback.
> > > > >
> > > > > On 06.12.19 10:00, Sebastien Han wrote:
> > > > > > ceph-volume is a sunk cost!
> > > > > > And your argument basically falls into that paradigm, "oh we have
> > > > > > invested so much already, that we cannot stop and we should continue
> > > > > > even though this will only bring more trouble". Incapable of accepting
> > > > > > this sunk cost.
> > > > > > All the issues that have been fixed with a lot of pain.
> > > > > > All that pain could have been avoided if LVM wasn't there and pursuing
> > > > > > in that direction will only lead us to more pain again.
> > > > >
> > > > > The reason I disagree here is the scenario were the WAL/DB is on a
> > > > > separate device and a single OSD crashes. In that case you would like to
> > > > > recreate just that single OSD instead of the whole group. Also if we
> > > > > deprecate a tool such like we did with ceph-disk, users have to migrate
> > > > > sooner or later if they don't want to do everything manually on the CLI
> > > > > (by that I mean via fdisk/pure lvm commands and so on).
> > > > >
> > > > > We could argue now that this can still be done on the command line
> > > > > manually but all our efforts are towards simplicity/automation and
> > > > > having everything in the Dashboard. If the underlying tool/functionality
> > > > > isn't there anymore, that isn't possible.
> > > > >
> > > >
> > > > I understand your position, yes when we start separating block/db/wal
> > > > things get really complex that's why I'm sticking with block/db/wal in
> > > > the same block.
> > > > Also, we haven't seen any request for separating those when running
> > > > OSDs on PVC in the Cloud. So we would likely continue to do so for a
> > > > while.
> > > >
> > > > > > Also, I'm not saying we should replace the tool but allow not using
> > > > > > LVM for a simple scenario to start with
> > > > >
> > > > > Which then leads me to, why couldn't such functionality be implemented
> > > > > into a single tool instead of having two at the end?
> > > > >
> > > > > So don't get me wrong, I'm not saying that I'm against everything I'm
> > > > > just saying that I think this is a topic that should be discussed in
> > > > > more depth.
> > > >
> > > > Yes, that's for sure.
> > > >
> > > > >
> > > > > As said, just my two cents here.
> > > > >
> > > > > Kai
> > > > >
> > > > > --
> > > > > SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, D 90409 Nürnberg
> > > > > GF:Geschäftsführer: Felix Imendörffer, (HRB 36809, AG Nürnberg)
> > > > >
> > > > >
> > > > _______________________________________________
> > > > Dev mailing list -- dev@xxxxxxx
> > > > To unsubscribe send an email to dev-leave@xxxxxxx
> > > > _______________________________________________
> > > Dev mailing list -- dev@xxxxxxx
> > > To unsubscribe send an email to dev-leave@xxxxxxx
> > _______________________________________________
> > Dev mailing list -- dev@xxxxxxx
> > To unsubscribe send an email to dev-leave@xxxxxxx
> >
_______________________________________________
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx




[Index of Archives]     [CEPH Users]     [Ceph Devel]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux