I would also think that both ansible playbooks and ceph-volume might be technologies utilized by this orchestration layer, correct? On Mon, Jul 16, 2018 at 9:05 AM, Brett Niver <bniver@xxxxxxxxxx> wrote: > Got it. And yes, explained that way, I wasn't really thinking about > orchestration management, but that makes sense. > > > > On Mon, Jul 16, 2018 at 9:00 AM, Sage Weil <sweil@xxxxxxxxxx> wrote: >> On Mon, 16 Jul 2018, Brett Niver wrote: >>> I would ask if either ceph-mgr or ceph-volume is the correct place. >>> To me it seems like a "run to finish, sequential automation" type of >>> process, which might better be implemented in an ansible playbook >>> utilizing ceph-volume? >> >> The problem is that this process takes weeks or months, and users will >> realistically need to pause/resume, perhaps change strategy or abort, >> resume again a few weeks later, etc. I don't think that having users >> leave a terminal open somewhere running a script is a good choice. >> >> The upside is that I think the orchestrator mgr layer we're building >> provides the right set of tools to build this pretty easily. Doing it >> there means it can work equally well (with an identical user experience) >> with ansible, rook, deapsea, whatever. >> >> sage >> >> >>> >>> >>> On Mon, Jul 16, 2018 at 8:11 AM, Alfredo Deza <adeza@xxxxxxxxxx> wrote: >>> > On Mon, Jul 9, 2018 at 11:12 AM, Theofilos Mouratidis >>> > <mtheofilos@xxxxxxxxx> wrote: >>> >> Hello, >>> >> >>> >> Here at CERN we created some scripts to convert >>> >> single hosts from filestore to bluestore with or without >>> >> journals. (I run it as we speak) It might be worth a look. >>> >> The one with journals is here: https://pastebin.com/raw/0mCQHuAR >>> >> For now it requires every osd to be filestore and each >>> >> ssd to have the same amount of osds. >>> >> The osd ids are preserved to avoid data rebalance. >>> >> >>> >> First it checks for the requires packages. >>> >> Then it creates on /tmp a plan file to execute >>> >> From the plan it counts different parameters >>> >> such as ssd numbers hdd numbers, partition >>> >> sizes etc. It follows the official guide you gave >>> >> for coverting a host. In the end after the osds >>> >> are drained, they are converted to bluestore >>> >> with the journal now as the block.db and they >>> >> are marked in to get the backfilled data back. >>> >> The job is done per set of X osds that have >>> >> the same journal device. >>> >> >>> >> Cheers, >>> >> Theo >>> >> >>> >> On 9 July 2018 at 15:24, John Spray <jspray@xxxxxxxxxx> wrote: >>> >>> On Fri, Jul 6, 2018 at 7:05 PM Sage Weil <sweil@xxxxxxxxxx> wrote: >>> >>>> >>> >>>> https://pad.ceph.com/p/bluestore_converter >>> >>>> >>> >>>> I sketched out a mgr module that automates the conversion of OSDs >>> >>>> from filestore to bluestore. It basically has two modes (by osd and by >>> >>>> host), mapping to the two variations documented in the docs. The main >>> >>>> difference is that it would do groups of OSDs that share devices, so if >>> >>>> you have a 5:1 HDD:SSD ratio it would do 5 OSDs and 6 devices at a time so >>> >>>> that the devices can be fully wiped (and we can move from GPT to LVM). >>> >>>> >>> >>>> There is a big dependency on the new mgr orchestrator layer. John, does >>> >>>> this line up with what you're designing? >>> >>> >>> >>> Yes -- particularly the need to tag explicit OSD IDs onto the >>> >>> definition of a drive group is something that came up thinking about >>> >>> how drive replacement will work in general. >>> >>> >>> >>> The set of transformations we can do on these groups for OSD >>> >>> replacement is the next (last?) big question to answer about what >>> >>> ceph-volume's interface should look like. Right now the cases I have >>> >>> are: >>> >>> - Normal creation: just a list of devices >>> >>> - Migration creation: a list of devices and a list of OSD IDs >>> >>> - In-place (drive name of replacement is same as original) >>> >>> replacement: a list of devices and the name of the device to replace, >>> >>> preserving its OSD ID. >>> >>> - General replacement (drive name of replacement is different): a >>> >>> list of devices which includes a new device, and the OSD ID that >>> >>> should be applied to the new device. >>> >>> - (Maybe) HDD addition, where during initial creation a number of >>> >>> "blanks" had been specified to reserve space on SSDs, and we can >>> >>> consume these with new HDD members of the group. >>> > >>> > Seems like most of the steps for converting can be done by >>> > ceph-volume. Is polling the safe-to-destroy the reason for placing >>> > this in the mgr vs >>> > delegating the functionality to ceph-volume? >>> > >>> > From http://docs.ceph.com/docs/master/rados/operations/bluestore-migration/?highlight=bluestore#mark-out-and-replace >>> > these are the ones that >>> > ceph-volume can handle today with internal APIs: >>> > >>> > * identify is an OSD is bluestore/filestore >>> > * identify what devices make the OSD >>> > * stop/start/status on systemctl units >>> > * find the current mount point of an OSD, and see if devices are >>> > currently mounted at target >>> > * mount and unmount >>> > >>> > The guide doesn't explain anything about encryption, which as complex >>> > as that is today, it might be useful not to try to do this on more >>> > than one place >>> > >>> > >>> >>> >>> >>> This is a longer list than I'd like, but I don't see a way to make it >>> >>> shorter (with the exception of dropping the ability to grow groups). >>> >>> >>> >>> I've written a document to try and formalize this stuff a bit: >>> >>> https://docs.google.com/document/d/1iwTnQc8d9W3BpQHgGYTMZSKvN6J7s0z8kQaYNxYvLho >>> >>> (google docs may prompt you to ask for access) >>> >>> >>> >>> Just updating the orchestrator python code to reflect that doc now. >>> >>> >>> >>> John >>> >>> >>> >>>> Also it would need/like the ability to pass a list of OSD IDs to reuse to >>> >>>> the new batch prepare function you're building... >>> >>> >>> >>> >>> >>> >>> >>>> Thoughts? >>> >>>> sage >>> >>>> >>> >>> -- >>> >>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>> >>> the body of a message to majordomo@xxxxxxxxxxxxxxx >>> >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>> > -- >>> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>> > the body of a message to majordomo@xxxxxxxxxxxxxxx >>> > More majordomo info at http://vger.kernel.org/majordomo-info.html >>> -- >>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>> the body of a message to majordomo@xxxxxxxxxxxxxxx >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>> >>> -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html