On Fri, 6 Dec 2019, Alfredo Deza wrote: > On Fri, Dec 6, 2019 at 8:31 AM Sage Weil <sage@xxxxxxxxxxxx> wrote: > > > > My thoughts here are still pretty inconclusive... > > > > I agree that we should invest a non-LVM mode, but there isn't a way to do > > that currently that supports dm-crypt that isn't complicated and > > convoluted, so it cannot be a full replacement for the LVM mode. > > The `ceph-volume simple` sub-command does allow dmcrypt. The key is > stored in the JSON file in /etc/ceph/osd. > > Is there a scenario you've seen where this is not possible? The > `simple` sub-command would even allow partitions (regardless of > ceph-disk). For the dm-crypt case, I'm assuming we need the key to be attached to the device in some way. LVM does this with another LV (IIRC); ceph-disk did it with a tiny partition. Putting it in /etc/ceph means you can't move a disk to another server without manually copying files arounds. sage > > > > > At the same time, Real Soon Now we're going to be building crimson OSDs > > backed by ZNS SSDs (and eventually persistent memory), which will also > > very clearly not be LVM-based. I'm a bit hesitant to introduce a > > bare-bones bluestore mode right now just because we'll be adding yet > > another variation soon, and it may be that we construct a general approach > > to both... but probably not. And the whole point of c-v's architecture > > was to be pluggable. > > > > So maybe a bare-bones bluestore mode makes sense. In the simple case, it > > really should be *very* simple. But its scope pretty quickly expodes: > > what about wal and db devices? We have labels for those, so we could > > support those, also easily... if the user has to partition the devices > > beforehand manually. They'll immediately want to use the new > > auto/batch thing, but that's tied to the LVM implementation. And what > > if one of the db/wal/main devices is an LV and another is not? We'd > > need to make sure the lvm mode machinery doesn't trigger unless all of > > its labels are there, but it might be confusing. All of which means that > > this is probably *only* useful for single-device OSDs. On the one hand, > > those are increasingly common (hello, all-SSD clusters), but on the other > > hand, for fast SSDs we may want to deploy N of them per device. > > > > Since we can't cover all of that, and at a minimum, we can't cover > > dm-crypt, Rook will need to behave with the lvm mode one way or another. > > So we need to have a wrapper (or something similar) no matter what. So I > > suggest we start there. > > > > sage > > > > > > On Fri, 6 Dec 2019, Sebastien Han wrote: > > > > > Hi Kai, > > > > > > Thanks! > > > ––––––––– > > > Sébastien Han > > > Senior Principal Software Engineer, Storage Architect > > > > > > "Always give 100%. Unless you're giving blood." > > > > > > On Fri, Dec 6, 2019 at 10:44 AM Kai Wagner <kwagner@xxxxxxxx> wrote: > > > > > > > > Hi Sebastien and thanks for your feedback. > > > > > > > > On 06.12.19 10:00, Sebastien Han wrote: > > > > > ceph-volume is a sunk cost! > > > > > And your argument basically falls into that paradigm, "oh we have > > > > > invested so much already, that we cannot stop and we should continue > > > > > even though this will only bring more trouble". Incapable of accepting > > > > > this sunk cost. > > > > > All the issues that have been fixed with a lot of pain. > > > > > All that pain could have been avoided if LVM wasn't there and pursuing > > > > > in that direction will only lead us to more pain again. > > > > > > > > The reason I disagree here is the scenario were the WAL/DB is on a > > > > separate device and a single OSD crashes. In that case you would like to > > > > recreate just that single OSD instead of the whole group. Also if we > > > > deprecate a tool such like we did with ceph-disk, users have to migrate > > > > sooner or later if they don't want to do everything manually on the CLI > > > > (by that I mean via fdisk/pure lvm commands and so on). > > > > > > > > We could argue now that this can still be done on the command line > > > > manually but all our efforts are towards simplicity/automation and > > > > having everything in the Dashboard. If the underlying tool/functionality > > > > isn't there anymore, that isn't possible. > > > > > > > > > > I understand your position, yes when we start separating block/db/wal > > > things get really complex that's why I'm sticking with block/db/wal in > > > the same block. > > > Also, we haven't seen any request for separating those when running > > > OSDs on PVC in the Cloud. So we would likely continue to do so for a > > > while. > > > > > > > > Also, I'm not saying we should replace the tool but allow not using > > > > > LVM for a simple scenario to start with > > > > > > > > Which then leads me to, why couldn't such functionality be implemented > > > > into a single tool instead of having two at the end? > > > > > > > > So don't get me wrong, I'm not saying that I'm against everything I'm > > > > just saying that I think this is a topic that should be discussed in > > > > more depth. > > > > > > Yes, that's for sure. > > > > > > > > > > > As said, just my two cents here. > > > > > > > > Kai > > > > > > > > -- > > > > SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, D 90409 Nürnberg > > > > GF:Geschäftsführer: Felix Imendörffer, (HRB 36809, AG Nürnberg) > > > > > > > > > > > _______________________________________________ > > > Dev mailing list -- dev@xxxxxxx > > > To unsubscribe send an email to dev-leave@xxxxxxx > > > _______________________________________________ > > Dev mailing list -- dev@xxxxxxx > > To unsubscribe send an email to dev-leave@xxxxxxx > _______________________________________________ > Dev mailing list -- dev@xxxxxxx > To unsubscribe send an email to dev-leave@xxxxxxx >
_______________________________________________ Dev mailing list -- dev@xxxxxxx To unsubscribe send an email to dev-leave@xxxxxxx