Re: automated bluestore conversion

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, 16 Jul 2018, Sage Weil wrote:
> On Mon, 16 Jul 2018, Brett Niver wrote:
> > I would ask if either ceph-mgr or ceph-volume is the correct place.
> > To me it seems like a "run to finish, sequential automation" type of
> > process, which might better be implemented in an ansible playbook
> > utilizing ceph-volume?
> 
> The problem is that this process takes weeks or months, and users will 
> realistically need to pause/resume, perhaps change strategy or abort, 
> resume again a few weeks later, etc.  I don't think that having users 
> leave a terminal open somewhere running a script is a good choice.
> 
> The upside is that I think the orchestrator mgr layer we're building 
> provides the right set of tools to build this pretty easily.  Doing it 
> there means it can work equally well (with an identical user experience) 
> with ansible, rook, deapsea, whatever.

Putting it in the mgr also would allow us to do things like scheduling 
time of day to do replacements, or maybe adjust recovery throttling 
automatically, or whatever else we decide would improve the overall 
process.

s

> 
> sage
> 
> 
> > 
> > 
> > On Mon, Jul 16, 2018 at 8:11 AM, Alfredo Deza <adeza@xxxxxxxxxx> wrote:
> > > On Mon, Jul 9, 2018 at 11:12 AM, Theofilos Mouratidis
> > > <mtheofilos@xxxxxxxxx> wrote:
> > >> Hello,
> > >>
> > >> Here at CERN we created some scripts to convert
> > >> single hosts from filestore to bluestore with or without
> > >> journals. (I run it as we speak) It might be worth a look.
> > >> The one with journals is here: https://pastebin.com/raw/0mCQHuAR
> > >> For now it requires every osd to be filestore and each
> > >> ssd to have the same amount of osds.
> > >> The osd ids are preserved to avoid data rebalance.
> > >>
> > >> First it checks for the requires packages.
> > >> Then it creates on /tmp a plan file to execute
> > >> From the plan it counts different parameters
> > >> such as ssd numbers hdd numbers, partition
> > >> sizes etc. It follows the official guide you gave
> > >> for coverting a host. In the end after the osds
> > >> are drained, they are converted to bluestore
> > >> with the journal now as the block.db and they
> > >> are marked in to get the backfilled data back.
> > >> The job is done per set of X osds that have
> > >> the same journal device.
> > >>
> > >> Cheers,
> > >> Theo
> > >>
> > >> On 9 July 2018 at 15:24, John Spray <jspray@xxxxxxxxxx> wrote:
> > >>> On Fri, Jul 6, 2018 at 7:05 PM Sage Weil <sweil@xxxxxxxxxx> wrote:
> > >>>>
> > >>>> https://pad.ceph.com/p/bluestore_converter
> > >>>>
> > >>>> I sketched out a mgr module that automates the conversion of OSDs
> > >>>> from filestore to bluestore.  It basically has two modes (by osd and by
> > >>>> host), mapping to the two variations documented in the docs.  The main
> > >>>> difference is that it would do groups of OSDs that share devices, so if
> > >>>> you have a 5:1 HDD:SSD ratio it would do 5 OSDs and 6 devices at a time so
> > >>>> that the devices can be fully wiped (and we can move from GPT to LVM).
> > >>>>
> > >>>> There is a big dependency on the new mgr orchestrator layer.  John, does
> > >>>> this line up with what you're designing?
> > >>>
> > >>> Yes -- particularly the need to tag explicit OSD IDs onto the
> > >>> definition of a drive group is something that came up thinking about
> > >>> how drive replacement will work in general.
> > >>>
> > >>> The set of transformations we can do on these groups for OSD
> > >>> replacement is the next (last?) big question to answer about what
> > >>> ceph-volume's interface should look like.  Right now the cases I have
> > >>> are:
> > >>>  - Normal creation: just a list of devices
> > >>>  - Migration creation: a list of devices and a list of OSD IDs
> > >>>  - In-place (drive name of replacement is same as original)
> > >>> replacement: a list of devices and the name of the device to replace,
> > >>> preserving its OSD ID.
> > >>>  - General replacement (drive name of replacement is different): a
> > >>> list of devices which includes a new device, and the OSD ID that
> > >>> should be applied to the new device.
> > >>>  - (Maybe) HDD addition, where during initial creation a number of
> > >>> "blanks" had been specified to reserve space on SSDs, and we can
> > >>> consume these with new HDD members of the group.
> > >
> > > Seems like most of the steps for converting can be done by
> > > ceph-volume. Is polling the safe-to-destroy the reason for placing
> > > this in the mgr vs
> > > delegating the functionality to ceph-volume?
> > >
> > > From http://docs.ceph.com/docs/master/rados/operations/bluestore-migration/?highlight=bluestore#mark-out-and-replace
> > > these are the ones that
> > > ceph-volume can handle today with internal APIs:
> > >
> > > * identify is an OSD is bluestore/filestore
> > > * identify what devices make the OSD
> > > * stop/start/status on systemctl units
> > > * find the current mount point of an OSD, and see if devices are
> > > currently mounted at target
> > > * mount and unmount
> > >
> > > The guide doesn't explain anything about encryption, which as complex
> > > as that is today, it might be useful not to try to do this on more
> > > than one place
> > >
> > >
> > >>>
> > >>> This is a longer list than I'd like, but I don't see a way to make it
> > >>> shorter (with the exception of dropping the ability to grow groups).
> > >>>
> > >>> I've written a document to try and formalize this stuff a bit:
> > >>> https://docs.google.com/document/d/1iwTnQc8d9W3BpQHgGYTMZSKvN6J7s0z8kQaYNxYvLho
> > >>> (google docs may prompt you to ask for access)
> > >>>
> > >>> Just updating the orchestrator python code to reflect that doc now.
> > >>>
> > >>> John
> > >>>
> > >>>> Also it would need/like the ability to pass a list of OSD IDs to reuse to
> > >>>> the new batch prepare function you're building...
> > >>>
> > >>>
> > >>>
> > >>>> Thoughts?
> > >>>> sage
> > >>>>
> > >>> --
> > >>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> > >>> the body of a message to majordomo@xxxxxxxxxxxxxxx
> > >>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> > > the body of a message to majordomo@xxxxxxxxxxxxxxx
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > --
> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> > the body of a message to majordomo@xxxxxxxxxxxxxxx
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > 
> > 
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux