Re: ceph-volume simple disk scenario without LVM for OSD on PVC

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



With Bluestore I 've seen more users collocating everything on the
same device as the performance is good enough already. Also, this is
something that we already demonstrated in various benchmarks.
So the block/db/wal on the same device probably suits most of the
users out there and drastically reduces the complexity of the setup,
as well as increasing availability (if you lose the db/wal device you
lose all the osds associated to it and a lot of people are not ready
to take that risk).
That's why I think this all-in-one mode osd should be simple and
robust without another extra LVM layer on it.
Even if we change the osd store, this will likely continue to be the same.

For more advanced setup, we can keep LVM for now I suppose...

Thanks!
–––––––––
Sébastien Han
Senior Principal Software Engineer, Storage Architect

"Always give 100%. Unless you're giving blood."

On Fri, Dec 6, 2019 at 2:31 PM Sage Weil <sage@xxxxxxxxxxxx> wrote:
>
> My thoughts here are still pretty inconclusive...
>
> I agree that we should invest a non-LVM mode, but there isn't a way to do
> that currently that supports dm-crypt that isn't complicated and
> convoluted, so it cannot be a full replacement for the LVM mode.
>
> At the same time, Real Soon Now we're going to be building crimson OSDs
> backed by ZNS SSDs (and eventually persistent memory), which will also
> very clearly not be LVM-based.  I'm a bit hesitant to introduce a
> bare-bones bluestore mode right now just because we'll be adding yet
> another variation soon, and it may be that we construct a general approach
> to both... but probably not.  And the whole point of c-v's architecture
> was to be pluggable.
>
> So maybe a bare-bones bluestore mode makes sense.  In the simple case, it
> really should be *very* simple.  But its scope pretty quickly expodes:
> what about wal and db devices?  We have labels for those, so we could
> support those, also easily... if the user has to partition the devices
> beforehand manually.  They'll immediately want to use the new
> auto/batch thing, but that's tied to the LVM implementation.  And what
> if one of the db/wal/main devices is an LV and another is not?  We'd
> need to make sure the lvm mode machinery doesn't trigger unless all of
> its labels are there, but it might be confusing.  All of which means that
> this is probably *only* useful for single-device OSDs.  On the one hand,
> those are increasingly common (hello, all-SSD clusters), but on the other
> hand, for fast SSDs we may want to deploy N of them per device.
>
> Since we can't cover all of that, and at a minimum, we can't cover
> dm-crypt, Rook will need to behave with the lvm mode one way or another.
> So we need to have a wrapper (or something similar) no matter what.  So I
> suggest we start there.
>
> sage
>
>
> On Fri, 6 Dec 2019, Sebastien Han wrote:
>
> > Hi Kai,
> >
> > Thanks!
> > –––––––––
> > Sébastien Han
> > Senior Principal Software Engineer, Storage Architect
> >
> > "Always give 100%. Unless you're giving blood."
> >
> > On Fri, Dec 6, 2019 at 10:44 AM Kai Wagner <kwagner@xxxxxxxx> wrote:
> > >
> > > Hi Sebastien and thanks for your feedback.
> > >
> > > On 06.12.19 10:00, Sebastien Han wrote:
> > > > ceph-volume is a sunk cost!
> > > > And your argument basically falls into that paradigm, "oh we have
> > > > invested so much already, that we cannot stop and we should continue
> > > > even though this will only bring more trouble". Incapable of accepting
> > > > this sunk cost.
> > > > All the issues that have been fixed with a lot of pain.
> > > > All that pain could have been avoided if LVM wasn't there and pursuing
> > > > in that direction will only lead us to more pain again.
> > >
> > > The reason I disagree here is the scenario were the WAL/DB is on a
> > > separate device and a single OSD crashes. In that case you would like to
> > > recreate just that single OSD instead of the whole group. Also if we
> > > deprecate a tool such like we did with ceph-disk, users have to migrate
> > > sooner or later if they don't want to do everything manually on the CLI
> > > (by that I mean via fdisk/pure lvm commands and so on).
> > >
> > > We could argue now that this can still be done on the command line
> > > manually but all our efforts are towards simplicity/automation and
> > > having everything in the Dashboard. If the underlying tool/functionality
> > > isn't there anymore, that isn't possible.
> > >
> >
> > I understand your position, yes when we start separating block/db/wal
> > things get really complex that's why I'm sticking with block/db/wal in
> > the same block.
> > Also, we haven't seen any request for separating those when running
> > OSDs on PVC in the Cloud. So we would likely continue to do so for a
> > while.
> >
> > > > Also, I'm not saying we should replace the tool but allow not using
> > > > LVM for a simple scenario to start with
> > >
> > > Which then leads me to, why couldn't such functionality be implemented
> > > into a single tool instead of having two at the end?
> > >
> > > So don't get me wrong, I'm not saying that I'm against everything I'm
> > > just saying that I think this is a topic that should be discussed in
> > > more depth.
> >
> > Yes, that's for sure.
> >
> > >
> > > As said, just my two cents here.
> > >
> > > Kai
> > >
> > > --
> > > SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, D 90409 Nürnberg
> > > GF:Geschäftsführer: Felix Imendörffer, (HRB 36809, AG Nürnberg)
> > >
> > >
> > _______________________________________________
> > Dev mailing list -- dev@xxxxxxx
> > To unsubscribe send an email to dev-leave@xxxxxxx
> >
_______________________________________________
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx




[Index of Archives]     [CEPH Users]     [Ceph Devel]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux