On Mon, Jul 16, 2018 at 8:32 AM, Brett Niver <bniver@xxxxxxxxxx> wrote: > I would ask if either ceph-mgr or ceph-volume is the correct place. > To me it seems like a "run to finish, sequential automation" type of > process, which might better be implemented in an ansible playbook > utilizing ceph-volume? That is an interesting question. Sure, I think that the end-to-end process might be a great fit for an Ansible playbook. The intermediate portions though, are so complicated, that I was wondering if ceph-volume couldn't do more here since it already knows how. > > > On Mon, Jul 16, 2018 at 8:11 AM, Alfredo Deza <adeza@xxxxxxxxxx> wrote: >> On Mon, Jul 9, 2018 at 11:12 AM, Theofilos Mouratidis >> <mtheofilos@xxxxxxxxx> wrote: >>> Hello, >>> >>> Here at CERN we created some scripts to convert >>> single hosts from filestore to bluestore with or without >>> journals. (I run it as we speak) It might be worth a look. >>> The one with journals is here: https://pastebin.com/raw/0mCQHuAR >>> For now it requires every osd to be filestore and each >>> ssd to have the same amount of osds. >>> The osd ids are preserved to avoid data rebalance. >>> >>> First it checks for the requires packages. >>> Then it creates on /tmp a plan file to execute >>> From the plan it counts different parameters >>> such as ssd numbers hdd numbers, partition >>> sizes etc. It follows the official guide you gave >>> for coverting a host. In the end after the osds >>> are drained, they are converted to bluestore >>> with the journal now as the block.db and they >>> are marked in to get the backfilled data back. >>> The job is done per set of X osds that have >>> the same journal device. >>> >>> Cheers, >>> Theo >>> >>> On 9 July 2018 at 15:24, John Spray <jspray@xxxxxxxxxx> wrote: >>>> On Fri, Jul 6, 2018 at 7:05 PM Sage Weil <sweil@xxxxxxxxxx> wrote: >>>>> >>>>> https://pad.ceph.com/p/bluestore_converter >>>>> >>>>> I sketched out a mgr module that automates the conversion of OSDs >>>>> from filestore to bluestore. It basically has two modes (by osd and by >>>>> host), mapping to the two variations documented in the docs. The main >>>>> difference is that it would do groups of OSDs that share devices, so if >>>>> you have a 5:1 HDD:SSD ratio it would do 5 OSDs and 6 devices at a time so >>>>> that the devices can be fully wiped (and we can move from GPT to LVM). >>>>> >>>>> There is a big dependency on the new mgr orchestrator layer. John, does >>>>> this line up with what you're designing? >>>> >>>> Yes -- particularly the need to tag explicit OSD IDs onto the >>>> definition of a drive group is something that came up thinking about >>>> how drive replacement will work in general. >>>> >>>> The set of transformations we can do on these groups for OSD >>>> replacement is the next (last?) big question to answer about what >>>> ceph-volume's interface should look like. Right now the cases I have >>>> are: >>>> - Normal creation: just a list of devices >>>> - Migration creation: a list of devices and a list of OSD IDs >>>> - In-place (drive name of replacement is same as original) >>>> replacement: a list of devices and the name of the device to replace, >>>> preserving its OSD ID. >>>> - General replacement (drive name of replacement is different): a >>>> list of devices which includes a new device, and the OSD ID that >>>> should be applied to the new device. >>>> - (Maybe) HDD addition, where during initial creation a number of >>>> "blanks" had been specified to reserve space on SSDs, and we can >>>> consume these with new HDD members of the group. >> >> Seems like most of the steps for converting can be done by >> ceph-volume. Is polling the safe-to-destroy the reason for placing >> this in the mgr vs >> delegating the functionality to ceph-volume? >> >> From http://docs.ceph.com/docs/master/rados/operations/bluestore-migration/?highlight=bluestore#mark-out-and-replace >> these are the ones that >> ceph-volume can handle today with internal APIs: >> >> * identify is an OSD is bluestore/filestore >> * identify what devices make the OSD >> * stop/start/status on systemctl units >> * find the current mount point of an OSD, and see if devices are >> currently mounted at target >> * mount and unmount >> >> The guide doesn't explain anything about encryption, which as complex >> as that is today, it might be useful not to try to do this on more >> than one place >> >> >>>> >>>> This is a longer list than I'd like, but I don't see a way to make it >>>> shorter (with the exception of dropping the ability to grow groups). >>>> >>>> I've written a document to try and formalize this stuff a bit: >>>> https://docs.google.com/document/d/1iwTnQc8d9W3BpQHgGYTMZSKvN6J7s0z8kQaYNxYvLho >>>> (google docs may prompt you to ask for access) >>>> >>>> Just updating the orchestrator python code to reflect that doc now. >>>> >>>> John >>>> >>>>> Also it would need/like the ability to pass a list of OSD IDs to reuse to >>>>> the new batch prepare function you're building... >>>> >>>> >>>> >>>>> Thoughts? >>>>> sage >>>>> >>>> -- >>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>> the body of a message to majordomo@xxxxxxxxxxxxxxx >>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >> -- >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >> the body of a message to majordomo@xxxxxxxxxxxxxxx >> More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html