Re: automated bluestore conversion

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello,

Here at CERN we created some scripts to convert
single hosts from filestore to bluestore with or without
journals. (I run it as we speak) It might be worth a look.
The one with journals is here: https://pastebin.com/raw/0mCQHuAR
For now it requires every osd to be filestore and each
ssd to have the same amount of osds.
The osd ids are preserved to avoid data rebalance.

First it checks for the requires packages.
Then it creates on /tmp a plan file to execute
>From the plan it counts different parameters
such as ssd numbers hdd numbers, partition
sizes etc. It follows the official guide you gave
for coverting a host. In the end after the osds
are drained, they are converted to bluestore
with the journal now as the block.db and they
are marked in to get the backfilled data back.
The job is done per set of X osds that have
the same journal device.

Cheers,
Theo

On 9 July 2018 at 15:24, John Spray <jspray@xxxxxxxxxx> wrote:
> On Fri, Jul 6, 2018 at 7:05 PM Sage Weil <sweil@xxxxxxxxxx> wrote:
>>
>> https://pad.ceph.com/p/bluestore_converter
>>
>> I sketched out a mgr module that automates the conversion of OSDs
>> from filestore to bluestore.  It basically has two modes (by osd and by
>> host), mapping to the two variations documented in the docs.  The main
>> difference is that it would do groups of OSDs that share devices, so if
>> you have a 5:1 HDD:SSD ratio it would do 5 OSDs and 6 devices at a time so
>> that the devices can be fully wiped (and we can move from GPT to LVM).
>>
>> There is a big dependency on the new mgr orchestrator layer.  John, does
>> this line up with what you're designing?
>
> Yes -- particularly the need to tag explicit OSD IDs onto the
> definition of a drive group is something that came up thinking about
> how drive replacement will work in general.
>
> The set of transformations we can do on these groups for OSD
> replacement is the next (last?) big question to answer about what
> ceph-volume's interface should look like.  Right now the cases I have
> are:
>  - Normal creation: just a list of devices
>  - Migration creation: a list of devices and a list of OSD IDs
>  - In-place (drive name of replacement is same as original)
> replacement: a list of devices and the name of the device to replace,
> preserving its OSD ID.
>  - General replacement (drive name of replacement is different): a
> list of devices which includes a new device, and the OSD ID that
> should be applied to the new device.
>  - (Maybe) HDD addition, where during initial creation a number of
> "blanks" had been specified to reserve space on SSDs, and we can
> consume these with new HDD members of the group.
>
> This is a longer list than I'd like, but I don't see a way to make it
> shorter (with the exception of dropping the ability to grow groups).
>
> I've written a document to try and formalize this stuff a bit:
> https://docs.google.com/document/d/1iwTnQc8d9W3BpQHgGYTMZSKvN6J7s0z8kQaYNxYvLho
> (google docs may prompt you to ask for access)
>
> Just updating the orchestrator python code to reflect that doc now.
>
> John
>
>> Also it would need/like the ability to pass a list of OSD IDs to reuse to
>> the new batch prepare function you're building...
>
>
>
>> Thoughts?
>> sage
>>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux