Re: ceph-volume simple disk scenario without LVM for OSD on PVC

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



My thoughts here are still pretty inconclusive...

I agree that we should invest a non-LVM mode, but there isn't a way to do 
that currently that supports dm-crypt that isn't complicated and          
convoluted, so it cannot be a full replacement for the LVM mode.

At the same time, Real Soon Now we're going to be building crimson OSDs 
backed by ZNS SSDs (and eventually persistent memory), which will also 
very clearly not be LVM-based.  I'm a bit hesitant to introduce a 
bare-bones bluestore mode right now just because we'll be adding yet 
another variation soon, and it may be that we construct a general approach 
to both... but probably not.  And the whole point of c-v's architecture 
was to be pluggable.

So maybe a bare-bones bluestore mode makes sense.  In the simple case, it 
really should be *very* simple.  But its scope pretty quickly expodes: 
what about wal and db devices?  We have labels for those, so we could 
support those, also easily... if the user has to partition the devices 
beforehand manually.  They'll immediately want to use the new 
auto/batch thing, but that's tied to the LVM implementation.  And what 
if one of the db/wal/main devices is an LV and another is not?  We'd 
need to make sure the lvm mode machinery doesn't trigger unless all of 
its labels are there, but it might be confusing.  All of which means that 
this is probably *only* useful for single-device OSDs.  On the one hand, 
those are increasingly common (hello, all-SSD clusters), but on the other 
hand, for fast SSDs we may want to deploy N of them per device.

Since we can't cover all of that, and at a minimum, we can't cover 
dm-crypt, Rook will need to behave with the lvm mode one way or another.  
So we need to have a wrapper (or something similar) no matter what.  So I 
suggest we start there.

sage


On Fri, 6 Dec 2019, Sebastien Han wrote:

> Hi Kai,
> 
> Thanks!
> –––––––––
> Sébastien Han
> Senior Principal Software Engineer, Storage Architect
> 
> "Always give 100%. Unless you're giving blood."
> 
> On Fri, Dec 6, 2019 at 10:44 AM Kai Wagner <kwagner@xxxxxxxx> wrote:
> >
> > Hi Sebastien and thanks for your feedback.
> >
> > On 06.12.19 10:00, Sebastien Han wrote:
> > > ceph-volume is a sunk cost!
> > > And your argument basically falls into that paradigm, "oh we have
> > > invested so much already, that we cannot stop and we should continue
> > > even though this will only bring more trouble". Incapable of accepting
> > > this sunk cost.
> > > All the issues that have been fixed with a lot of pain.
> > > All that pain could have been avoided if LVM wasn't there and pursuing
> > > in that direction will only lead us to more pain again.
> >
> > The reason I disagree here is the scenario were the WAL/DB is on a
> > separate device and a single OSD crashes. In that case you would like to
> > recreate just that single OSD instead of the whole group. Also if we
> > deprecate a tool such like we did with ceph-disk, users have to migrate
> > sooner or later if they don't want to do everything manually on the CLI
> > (by that I mean via fdisk/pure lvm commands and so on).
> >
> > We could argue now that this can still be done on the command line
> > manually but all our efforts are towards simplicity/automation and
> > having everything in the Dashboard. If the underlying tool/functionality
> > isn't there anymore, that isn't possible.
> >
> 
> I understand your position, yes when we start separating block/db/wal
> things get really complex that's why I'm sticking with block/db/wal in
> the same block.
> Also, we haven't seen any request for separating those when running
> OSDs on PVC in the Cloud. So we would likely continue to do so for a
> while.
> 
> > > Also, I'm not saying we should replace the tool but allow not using
> > > LVM for a simple scenario to start with
> >
> > Which then leads me to, why couldn't such functionality be implemented
> > into a single tool instead of having two at the end?
> >
> > So don't get me wrong, I'm not saying that I'm against everything I'm
> > just saying that I think this is a topic that should be discussed in
> > more depth.
> 
> Yes, that's for sure.
> 
> >
> > As said, just my two cents here.
> >
> > Kai
> >
> > --
> > SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, D 90409 Nürnberg
> > GF:Geschäftsführer: Felix Imendörffer, (HRB 36809, AG Nürnberg)
> >
> >
> _______________________________________________
> Dev mailing list -- dev@xxxxxxx
> To unsubscribe send an email to dev-leave@xxxxxxx
> 
_______________________________________________
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx

[Index of Archives]     [CEPH Users]     [Ceph Devel]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux