Re: Filestore to Bluestore migration question

"Hayashida, Mami" <mami.hayashida@xxxxxxx> · Wed, 31 Oct 2018 08:27:25 -0400

Thank you for your replies. So, if I use the method Hector suggested (by creating PVs, VGs.... etc. first), can I add the --osd-id parameter to the command as in 
ceph-volume lvm prepare --bluestore --data hdd0/data0 --block.db ssd/db0  --osd-id 0 
ceph-volume lvm prepare --bluestore --data hdd1/data1 --block.db ssd/db1  --osd-id 1

so that Filestore -> Bluestore migration will not change the osd ID on each disk?   

And one more question.  Are there any changes I need to make to the ceph.conf file?  I did comment out this line that was probably used for creating Filestore (using ceph-deploy):  osd journal size = 40960

On Wed, Oct 31, 2018 at 7:03 AM, Alfredo Deza <adeza@xxxxxxxxxx> wrote:
On Wed, Oct 31, 2018 at 5:22 AM Hector Martin <hector@xxxxxxxxxxxxxx> wrote:

>

> On 31/10/2018 05:55, Hayashida, Mami wrote:

> > I am relatively new to Ceph and need some advice on Bluestore migration.

> > I tried migrating a few of our test cluster nodes from Filestore to

> > Bluestore by following this

> > (http://docs.ceph.com/docs/luminous/rados/operations/bluestore-migration/)

> > as the cluster is currently running 12.2.9. The cluster, originally set

> > up by my predecessors, was running Jewel until I upgraded it recently to

> > Luminous.

> >

> > OSDs in each OSD host is set up in such a way that for ever 10 data HDD

> > disks, there is one SSD drive that is holding their journals.  For

> > example, osd.0 data is on /dev/sdh and its Filestore journal is on a

> > partitioned part of /dev/sda. So, lsblk shows something like

> >

> > sda       8:0    0 447.1G  0 disk

> > ├─sda1    8:1    0    40G  0 part # journal for osd.0

> >

> > sdh       8:112  0   3.7T  0 disk

> > └─sdh1    8:113  0   3.7T  0 part /var/lib/ceph/osd/ceph-0

> >

>

> The BlueStore documentation states that the wal will automatically use

> the db volume if it fits, so if you're using a single SSD I think

> there's no good reason to split out the wal, if I'm understanding it

> correctly.

This is correct, no need for wal in this case.

>

> You should be using ceph-volume, since ceph-disk is deprecated. If

> you're sharing the SSD as wal/db for a bunch of OSDs, I think you're

> going to have to create the LVs yourself first. The data HDDs should be

> PVs (I don't think it matters if they're partitions or whole disk PVs as

> long as LVM discovers them) each part of a separate VG (e.g. hdd0-hdd9)

> containing a single LV. Then the SSD should itself be an LV for a

> separate shared SSD VG (e.g. ssd).

>

> So something like (assuming sda is your wal SSD and sdb and onwards are

> your OSD HDDs):

> pvcreate /dev/sda

> pvcreate /dev/sdb

> pvcreate /dev/sdc

> ...

>

> vgcreate ssd /dev/sda

> vgcreate hdd0 /dev/sdb

> vgcreate hdd1 /dev/sdc

> ...

>

> lvcreate -L 40G -n db0 ssd

> lvcreate -L 40G -n db1 ssd

> ...

>

> lvcreate -L 100%VG -n data0 hdd0

> lvcreate -L 100%VG -n data1 hdd1

> ...

>

> ceph-volume lvm prepare --bluestore --data hdd0/data0 --block.db ssd/db0

> ceph-volume lvm prepare --bluestore --data hdd1/data1 --block.db ssd/db1

> ...

>

> ceph-volume lvm activate --all

>

> I think it might be possible to just let ceph-volume create the PV/VG/LV

> for the data disks and only manually create the DB LVs, but it shouldn't

> hurt to do it on your own and just give ready-made LVs to ceph-volume

> for everything.

Another alternative here is to use the new `lvm batch` subcommand to

do all of this in one go:

ceph-volume lvm batch /dev/sda /dev/sdb /dev/sdc /dev/sdd /dev/sde

/dev/sdf /dev/sdg /dev/sdh

Will detect that sda is an SSD and will create the LVs for you for

block.db (one for each spinning disk). For each spinning disk, it will

place data on them.

The one caveat is that you no longer control OSD IDs, and they are

created with whatever the monitors are giving out.

This operation is not supported from ceph-deploy either.

>

> --

> Hector Martin (hector@xxxxxxxxxxxxxx)

> Public Key: https://marcan.st/marcan.asc

> _______________________________________________

> ceph-users mailing list

> ceph-users@xxxxxxxxxxxxxx

> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

-- 
Mami Hayashida
Research Computing Associate

Research Computing Infrastructure
University of Kentucky Information Technology Services 
301 Rose Street | 102 James F. Hardymon Building
Lexington, KY 40506-0495
mami.hayashida@xxxxxxx
(859)323-7521

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com