Re: ceph-volume simple disk scenario without LVM for OSD on PVC

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Dec 06, 2019 at 01:31:05PM +0000, Sage Weil wrote:
>My thoughts here are still pretty inconclusive...
>
>I agree that we should invest a non-LVM mode, but there isn't a way to do
>that currently that supports dm-crypt that isn't complicated and
>convoluted, so it cannot be a full replacement for the LVM mode.
>
>At the same time, Real Soon Now we're going to be building crimson OSDs
>backed by ZNS SSDs (and eventually persistent memory), which will also
>very clearly not be LVM-based.  I'm a bit hesitant to introduce a
>bare-bones bluestore mode right now just because we'll be adding yet
>another variation soon, and it may be that we construct a general approach
>to both... but probably not.  And the whole point of c-v's architecture
>was to be pluggable.
>
>So maybe a bare-bones bluestore mode makes sense.  In the simple case, it
>really should be *very* simple.  But its scope pretty quickly expodes:
>what about wal and db devices?  We have labels for those, so we could
>support those, also easily... if the user has to partition the devices
>beforehand manually.  They'll immediately want to use the new
>auto/batch thing, but that's tied to the LVM implementation.  And what
>if one of the db/wal/main devices is an LV and another is not?  We'd
>need to make sure the lvm mode machinery doesn't trigger unless all of
>its labels are there, but it might be confusing.  All of which means that
>this is probably *only* useful for single-device OSDs.  On the one hand,
>those are increasingly common (hello, all-SSD clusters), but on the other
>hand, for fast SSDs we may want to deploy N of them per device.
I don't think keeping a simple or barebones approach will survive contact with 
real-world deployments. Imho if we want a raw mode, we better be prepared to 
deal with multi-device OSDs and multi-OSD devices and the partitioning this 
requires.
>
>Since we can't cover all of that, and at a minimum, we can't cover
>dm-crypt, Rook will need to behave with the lvm mode one way or another.
>So we need to have a wrapper (or something similar) no matter what.  So I
>suggest we start there.
Agreed.
>
>sage
>
>
>On Fri, 6 Dec 2019, Sebastien Han wrote:
>
>> Hi Kai,
>>
>> Thanks!
>> –––––––––
>> Sébastien Han
>> Senior Principal Software Engineer, Storage Architect
>>
>> "Always give 100%. Unless you're giving blood."
>>
>> On Fri, Dec 6, 2019 at 10:44 AM Kai Wagner <kwagner@xxxxxxxx> wrote:
>> >
>> > Hi Sebastien and thanks for your feedback.
>> >
>> > On 06.12.19 10:00, Sebastien Han wrote:
>> > > ceph-volume is a sunk cost!
>> > > And your argument basically falls into that paradigm, "oh we have
>> > > invested so much already, that we cannot stop and we should continue
>> > > even though this will only bring more trouble". Incapable of accepting
>> > > this sunk cost.
>> > > All the issues that have been fixed with a lot of pain.
>> > > All that pain could have been avoided if LVM wasn't there and pursuing
>> > > in that direction will only lead us to more pain again.
>> >
>> > The reason I disagree here is the scenario were the WAL/DB is on a
>> > separate device and a single OSD crashes. In that case you would like to
>> > recreate just that single OSD instead of the whole group. Also if we
>> > deprecate a tool such like we did with ceph-disk, users have to migrate
>> > sooner or later if they don't want to do everything manually on the CLI
>> > (by that I mean via fdisk/pure lvm commands and so on).
>> >
>> > We could argue now that this can still be done on the command line
>> > manually but all our efforts are towards simplicity/automation and
>> > having everything in the Dashboard. If the underlying tool/functionality
>> > isn't there anymore, that isn't possible.
>> >
>>
>> I understand your position, yes when we start separating block/db/wal
>> things get really complex that's why I'm sticking with block/db/wal in
>> the same block.
>> Also, we haven't seen any request for separating those when running
>> OSDs on PVC in the Cloud. So we would likely continue to do so for a
>> while.
>>
>> > > Also, I'm not saying we should replace the tool but allow not using
>> > > LVM for a simple scenario to start with
>> >
>> > Which then leads me to, why couldn't such functionality be implemented
>> > into a single tool instead of having two at the end?
>> >
>> > So don't get me wrong, I'm not saying that I'm against everything I'm
>> > just saying that I think this is a topic that should be discussed in
>> > more depth.
>>
>> Yes, that's for sure.
>>
>> >
>> > As said, just my two cents here.
>> >
>> > Kai
>> >
>> > --
>> > SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, D 90409 Nürnberg
>> > GF:Geschäftsführer: Felix Imendörffer, (HRB 36809, AG Nürnberg)
>> >
>> >
>> _______________________________________________
>> Dev mailing list -- dev@xxxxxxx
>> To unsubscribe send an email to dev-leave@xxxxxxx
>>

>_______________________________________________
>Dev mailing list -- dev@xxxxxxx
>To unsubscribe send an email to dev-leave@xxxxxxx


-- 
Jan Fajerski
Senior Software Engineer Enterprise Storage
SUSE Software Solutions Germany GmbH
Maxfeldstr. 5, 90409 Nürnberg, Germany
(HRB 36809, AG Nürnberg)
Geschäftsführer: Felix Imendörffer
_______________________________________________
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx




[Index of Archives]     [CEPH Users]     [Ceph Devel]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux