Re: Filestore to Bluestore migration question

"Hayashida, Mami" <mami.hayashida@xxxxxxx> · Mon, 5 Nov 2018 12:01:04 -0500

I did find in /etc/fstab entries like this for those 10 disks
/dev/sdh1   /var/lib/ceph/osd/ceph-60  xfs noatime,nodiratime 0 0

Should I comment all 10 of them out (for osd.{60-69}) and try rebooting again?

On Mon, Nov 5, 2018 at 11:54 AM, Hayashida, Mami <mami.hayashida@xxxxxxx> wrote:
I was just going to write that the "ln" command did not solve the problem.  When I rebooted the node, it again went into an emergency mode and I got exactly the same errors (systemd[1]: Timed out waiting for device dev-sdh1.device.;-- Subject: Unit dev-sdh1.device has failed...).  I will look into /etc/ftab to see what I find there.  

On Mon, Nov 5, 2018 at 11:51 AM, Hector Martin <hector@xxxxxxxxxxxxxx> wrote:
Those units don't get triggered out of nowhere, there has to be a

partition table with magic GUIDs or a fstab or something to cause them

to be triggered. The better way should be to get rid of that instead of

overriding the ceph-disk service instances, I think.

Given dev-sdh1.device is trying to start, I suspect you have them in

/etc/fstab. You should have a look around /etc to see if you have any

stray references to those devices or old ceph-disk OSDs.

On 11/6/18 1:37 AM, Hayashida, Mami wrote:

> Alright.  Thanks -- I will try this now.

> 

> On Mon, Nov 5, 2018 at 11:36 AM, Alfredo Deza <adeza@xxxxxxxxxx

> <mailto:adeza@xxxxxxxxxx>> wrote:

> 

>     On Mon, Nov 5, 2018 at 11:33 AM Hayashida, Mami

>     <mami.hayashida@xxxxxxx <mailto:mami.hayashida@xxxxxxx>> wrote:

>     >

>     > But I still have 50 other Filestore OSDs on the same node, though.  Wouldn't doing it all at once (by not identifying the osd-id) be a problem for those?  I have not migrated data out of those 50 OSDs yet.

> 

>     Sure, like I said, if you want to do them one by one, then your

>     initial command is fine.

> 

>     >

>     > On Mon, Nov 5, 2018 at 11:31 AM, Alfredo Deza <adeza@xxxxxxxxxx

>     <mailto:adeza@xxxxxxxxxx>> wrote:

>     >>

>     >> On Mon, Nov 5, 2018 at 11:24 AM Hayashida, Mami

>     <mami.hayashida@xxxxxxx <mailto:mami.hayashida@xxxxxxx>> wrote:

>     >> >

>     >> > Thank you for all of your replies. Just to clarify...

>     >> >

>     >> > 1. Hector:  I did unmount the file system if what you meant was

>     unmounting the /var/lib/ceph/osd/ceph-$osd-id   for those disks (in

>     my case osd.60-69) before running the ceph-volume lvm zap command

>     >> >

>     >> > 2. Alfredo: so I can at this point run the "ln" command

>     (basically getting rid of the symbolic link) for each of those OSDs

>     I have converted?  For example

>     >> >

>     >> > ln -sf /dev/null /etc/systemc/system/ceph-disk@60.service

>     >> That will take care of OSD 60. This is fine if you want to do

>     them one

>     >> by one. To affect everything from ceph-disk, you would need to:

>     >>

>     >> ln -sf /dev/null /etc/systemd/system/ceph-disk@.service

>     >>

>     >> >

>     >> >    Then reboot?

>     >> >

>     >> >

>     >> > On Mon, Nov 5, 2018 at 11:17 AM, Alfredo Deza <adeza@xxxxxxxxxx

>     <mailto:adeza@xxxxxxxxxx>> wrote:

>     >> >>

>     >> >> On Mon, Nov 5, 2018 at 10:43 AM Hayashida, Mami

>     <mami.hayashida@xxxxxxx <mailto:mami.hayashida@xxxxxxx>> wrote:

>     >> >> >

>     >> >> > Additional info -- I know that

>     /var/lib/ceph/osd/ceph-{60..69} are not mounted at this point (i.e. 

>     mount | grep ceph-60, and 61-69, returns nothing.).  They don't show

>     up when I run "df", either.

>     >> >> >

>     >> >> > On Mon, Nov 5, 2018 at 10:15 AM, Hayashida, Mami

>     <mami.hayashida@xxxxxxx <mailto:mami.hayashida@xxxxxxx>> wrote:

>     >> >> >>

>     >> >> >> Well, over the weekend the whole server went down and is

>     now in the emergency mode. (I am running Ubuntu 16.04).  When I run

>     "journalctl  -p err -xb"   I see that

>     >> >> >>

>     >> >> >> systemd[1]: Timed out waiting for device dev-sdh1.device.

>     >> >> >> -- Subject: Unit dev-sdh1.device has failed

>     >> >> >> -- Defined-By: systemd

>     >> >> >> -- Support: http://lists.freeddesktop.org/..

>     <http://lists.freeddesktop.org/..>..

>     >> >> >> --

>     >> >> >> -- Unit dev-sdh1.device has failed.

>     >> >> >>

>     >> >> >>

>     >> >> >> I see this for every single one of the newly-converted

>     Bluestore OSD disks (/dev/sd{h..q}1).

>     >> >>

>     >> >> This will happen with stale ceph-disk systemd units. You can

>     disable those with:

>     >> >>

>     >> >> ln -sf /dev/null /etc/systemd/system/ceph-disk@.service

>     >> >>

>     >> >>

>     >> >> >>

>     >> >> >>

>     >> >> >> --

>     >> >> >>

>     >> >> >> On Mon, Nov 5, 2018 at 9:57 AM, Alfredo Deza

>     <adeza@xxxxxxxxxx <mailto:adeza@xxxxxxxxxx>> wrote:

>     >> >> >>>

>     >> >> >>> On Fri, Nov 2, 2018 at 5:04 PM Hayashida, Mami

>     <mami.hayashida@xxxxxxx <mailto:mami.hayashida@xxxxxxx>> wrote:

>     >> >> >>> >

>     >> >> >>> > I followed all the steps Hector suggested, and almost

>     everything seems to have worked fine.  I say "almost" because one

>     out of the 10 osds I was migrating could not be activated even

>     though everything up to that point worked just as well for that osd

>     as the other ones. Here is the output for that particular failure:

>     >> >> >>> >

>     >> >> >>> > *****

>     >> >> >>> > ceph-volume lvm activate --all

>     >> >> >>> > ...

>     >> >> >>> > --> Activating OSD ID 67 FSID 17cd6755-76f9-4160-906c-XXXXXX

>     >> >> >>> > Running command: mount -t tmpfs tmpfs

>     /var/lib/ceph/osd/ceph-67

>     >> >> >>> > --> Absolute path not found for executable: restorecon

>     >> >> >>> > --> Ensure $PATH environment variable contains common

>     executable locations

>     >> >> >>> > Running command: ceph-bluestore-tool --cluster=ceph

>     prime-osd-dir --dev /dev/hdd67/data67 --path /var/lib/ceph/osd/ceph-67

>     >> >> >>> >  stderr: failed to read label for /dev/hdd67/data67: (2)

>     No such file or directory

>     >> >> >>> > -->  RuntimeError: command returned non-zero exit status:

>     >> >> >>>

>     >> >> >>> I wonder if the /dev/sdo device where hdd67/data67 is

>     located is

>     >> >> >>> available, or if something else is missing. You could try

>     poking

>     >> >> >>> around with `lvs` and see if that LV shows up, also

>     `ceph-volume lvm

>     >> >> >>> list hdd67/data67` can help here because it

>     >> >> >>> groups OSDs to LVs. If you run `ceph-volume lvm list

>     --format=json

>     >> >> >>> hdd67/data67` you will also see all the metadata stored in it.

>     >> >> >>>

>     >> >> >>> Would be interesting to see that output to verify things

>     exist and are

>     >> >> >>> usable for OSD activation.

>     >> >> >>>

>     >> >> >>> >

>     >> >> >>> > *******

>     >> >> >>> > I then checked to see if the rest of the migrated OSDs

>     were back in by calling the ceph osd tree command from the admin

>     node.  Since they were not, I tried to restart the first of the 10

>     newly migrated Bluestore osds by calling

>     >> >> >>> >

>     >> >> >>> > *******

>     >> >> >>> > systemctl start ceph-osd@60

>     >> >> >>> >

>     >> >> >>> > At that point, not only this particular service could

>     not be started, but ALL the OSDs (daemons) on the entire node shut

>     down!!!!!

>     >> >> >>> >

>     >> >> >>> > ******

>     >> >> >>> > root@osd1:~# systemctl status ceph-osd@60

>     >> >> >>> > ● ceph-osd@60.service - Ceph object storage daemon osd.60

>     >> >> >>> >    Loaded: loaded

>     (/lib/systemd/system/ceph-osd@.service; enabled-runtime; vendor

>     preset: enabled)

>     >> >> >>> >    Active: inactive (dead) since Fri 2018-11-02 15:47:20

>     EDT; 1h 9min ago

>     >> >> >>> >   Process: 3473621 ExecStart=/usr/bin/ceph-osd -f

>     --cluster ${CLUSTER} --id %i --setuser ceph --setgroup ceph

>     (code=exited, status=0/SUCCESS)

>     >> >> >>> >   Process: 3473147

>     ExecStartPre=/usr/lib/ceph/ceph-osd-prestart.sh --cluster ${CLUSTER}

>     --id %i (code=exited, status=0/SUCCESS)

>     >> >> >>> >  Main PID: 3473621 (code=exited, status=0/SUCCESS)

>     >> >> >>> >

>     >> >> >>> > Oct 29 15:57:53 osd1.xxxxx.uky.edu

>     <http://osd1.xxxxx.uky.edu> ceph-osd[3473621]: 2018-10-29

>     15:57:53.868856 7f68adaece00 -1 osd.60 48106 log_to_monitors

>     {default=true}

>     >> >> >>> > Oct 29 15:57:53 osd1.xxxxx.uky.edu

>     <http://osd1.xxxxx.uky.edu> ceph-osd[3473621]: 2018-10-29

>     15:57:53.874373 7f68adaece00 -1 osd.60 48106

>     mon_cmd_maybe_osd_create fail: 'you must complete the upgrade and

>     'ceph osd require-osd-release luminous' before using crush device

>     classes': (1) Operation not permitted

>     >> >> >>> > Oct 30 06:25:01 osd1.xxxxx.uky.edu

>     <http://osd1.xxxxx.uky.edu> ceph-osd[3473621]: 2018-10-30

>     06:25:01.961720 7f687feb3700 -1 received  signal: Hangup from  PID:

>     3485955 task name: killall -q -1 ceph-mon ceph-mgr ceph-mds ceph-osd

>     ceph-fuse radosgw  UID: 0

>     >> >> >>> > Oct 31 06:25:02 osd1.xxxxx.uky.edu

>     <http://osd1.xxxxx.uky.edu> ceph-osd[3473621]: 2018-10-31

>     06:25:02.110898 7f687feb3700 -1 received  signal: Hangup from  PID:

>     3500945 task name: killall -q -1 ceph-mon ceph-mgr ceph-mds ceph-osd

>     ceph-fuse radosgw  UID: 0

>     >> >> >>> > Nov 01 06:25:02 osd1.xxxxx.uky.edu

>     <http://osd1.xxxxx.uky.edu> ceph-osd[3473621]: 2018-11-01

>     06:25:02.101548 7f687feb3700 -1 received  signal: Hangup from  PID:

>     3514774 task name: killall -q -1 ceph-mon ceph-mgr ceph-mds ceph-osd

>     ceph-fuse radosgw  UID: 0

>     >> >> >>> > Nov 02 06:25:02 osd1.xxxxx.uky.edu

>     <http://osd1.xxxxx.uky.edu> ceph-osd[3473621]: 2018-11-02

>     06:25:01.997557 7f687feb3700 -1 received  signal: Hangup from  PID:

>     3528128 task name: killall -q -1 ceph-mon ceph-mgr ceph-mds ceph-osd

>     ceph-fuse radosgw  UID: 0

>     >> >> >>> > Nov 02 15:47:16 osd1.oxxxxx.uky.edu

>     <http://osd1.oxxxxx.uky.edu> ceph-osd[3473621]: 2018-11-02

>     15:47:16.322229 7f687feb3700 -1 received  signal: Terminated from 

>     PID: 1 task name: /lib/systemd/systemd --system --deserialize 20  UID: 0

>     >> >> >>> > Nov 02 15:47:16 osd1.xxxxx.uky.edu

>     <http://osd1.xxxxx.uky.edu> ceph-osd[3473621]: 2018-11-02

>     15:47:16.322253 7f687feb3700 -1 osd.60 48504 *** Got signal

>     Terminated ***

>     >> >> >>> > Nov 02 15:47:16 osd1.xxxxx.uky.edu

>     <http://osd1.xxxxx.uky.edu> ceph-osd[3473621]: 2018-11-02

>     15:47:16.676625 7f687feb3700 -1 osd.60 48504 shutdown

>     >> >> >>> > Nov 02 16:34:05 osd1.oxxxxx.uky.edu

>     <http://osd1.oxxxxx.uky.edu> systemd[1]: Stopped Ceph object storage

>     daemon osd.60.

>     >> >> >>> >

>     >> >> >>> > **********

>     >> >> >>> > And ere is the output for one of the OSDs (osd.70 still

>     using Filestore) that shut down right when I tried to start osd.60

>     >> >> >>> >

>     >> >> >>> > ********

>     >> >> >>> >

>     >> >> >>> > root@osd1:~# systemctl status ceph-osd@70

>     >> >> >>> > ● ceph-osd@70.service - Ceph object storage daemon osd.70

>     >> >> >>> >    Loaded: loaded

>     (/lib/systemd/system/ceph-osd@.service; enabled-runtime; vendor

>     preset: enabled)

>     >> >> >>> >    Active: inactive (dead) since Fri 2018-11-02 16:34:08

>     EDT; 2min 6s ago

>     >> >> >>> >   Process: 3473629 ExecStart=/usr/bin/ceph-osd -f

>     --cluster ${CLUSTER} --id %i --setuser ceph --setgroup ceph

>     (code=exited, status=0/SUCCESS)

>     >> >> >>> >   Process: 3473153

>     ExecStartPre=/usr/lib/ceph/ceph-osd-prestart.sh --cluster ${CLUSTER}

>     --id %i (code=exited, status=0/SUCCESS)

>     >> >> >>> >  Main PID: 3473629 (code=exited, status=0/SUCCESS)

>     >> >> >>> >

>     >> >> >>> > Oct 29 15:57:51 osd1.xxxx.uky.edu

>     <http://osd1.xxxx.uky.edu> ceph-osd[3473629]: 2018-10-29

>     15:57:51.300563 7f530eec2e00 -1 osd.70 pg_epoch: 48095 pg[68.ces1(

>     empty local-lis/les=47489/47489 n=0 ec=6030/6030 lis/c 47488/47488

>     les/c/f 47489/47489/0 47485/47488/47488) [138,70,203]p138(0) r=1

>     lpr=0 crt=0'0 unknown NO

>     >> >> >>> > Oct 30 06:25:01 osd1.xxxx.uky.edu

>     <http://osd1.xxxx.uky.edu> ceph-osd[3473629]: 2018-10-30

>     06:25:01.961743 7f52d8e44700 -1 received  signal: Hangup from  PID:

>     3485955 task name: killall -q -1 ceph-mon ceph-mgr ceph-mds ceph-osd

>     ceph-fuse radosgw  UID: 0

>     >> >> >>> > Oct 31 06:25:02 osd1.xxxx.uky.edu

>     <http://osd1.xxxx.uky.edu> ceph-osd[3473629]: 2018-10-31

>     06:25:02.110920 7f52d8e44700 -1 received  signal: Hangup from  PID:

>     3500945 task name: killall -q -1 ceph-mon ceph-mgr ceph-mds ceph-osd

>     ceph-fuse radosgw  UID: 0

>     >> >> >>> > Nov 01 06:25:02 osd1.xxxx.uky.edu

>     <http://osd1.xxxx.uky.edu> ceph-osd[3473629]: 2018-11-01

>     06:25:02.101568 7f52d8e44700 -1 received  signal: Hangup from  PID:

>     3514774 task name: killall -q -1 ceph-mon ceph-mgr ceph-mds ceph-osd

>     ceph-fuse radosgw  UID: 0

>     >> >> >>> > Nov 02 06:25:02 osd1.xxxx.uky.edu

>     <http://osd1.xxxx.uky.edu> ceph-osd[3473629]: 2018-11-02

>     06:25:01.997633 7f52d8e44700 -1 received  signal: Hangup from  PID:

>     3528128 task name: killall -q -1 ceph-mon ceph-mgr ceph-mds ceph-osd

>     ceph-fuse radosgw  UID: 0

>     >> >> >>> > Nov 02 16:34:05 osd1.xxxx.uky.edu

>     <http://osd1.xxxx.uky.edu> ceph-osd[3473629]: 2018-11-02

>     16:34:05.607714 7f52d8e44700 -1 received  signal: Terminated from 

>     PID: 1 task name: /lib/systemd/systemd --system --deserialize 20  UID: 0

>     >> >> >>> > Nov 02 16:34:05 osd1.xxxx.uky.edu

>     <http://osd1.xxxx.uky.edu> ceph-osd[3473629]: 2018-11-02

>     16:34:05.607738 7f52d8e44700 -1 osd.70 48535 *** Got signal

>     Terminated ***

>     >> >> >>> > Nov 02 16:34:05 osd1.xxxx.uky.edu

>     <http://osd1.xxxx.uky.edu> systemd[1]: Stopping Ceph object storage

>     daemon osd.70...

>     >> >> >>> > Nov 02 16:34:05 osd1.xxxx.uky.edu

>     <http://osd1.xxxx.uky.edu> ceph-osd[3473629]: 2018-11-02

>     16:34:05.677348 7f52d8e44700 -1 osd.70 48535 shutdown

>     >> >> >>> > Nov 02 16:34:08 osd1.xxxx.uky.edu

>     <http://osd1.xxxx.uky.edu> systemd[1]: Stopped Ceph object storage

>     daemon osd.70.

>     >> >> >>> >

>     >> >> >>> > **************

>     >> >> >>> >

>     >> >> >>> > So, at this point, ALL the OSDs on that node have been

>     shut down.

>     >> >> >>> >

>     >> >> >>> > For your information this is the output of lsblk command

>     (selection)

>     >> >> >>> > *****

>     >> >> >>> > root@osd1:~# lsblk

>     >> >> >>> > NAME           MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT

>     >> >> >>> > sda              8:0    0 447.1G  0 disk

>     >> >> >>> > ├─ssd0-db60    252:0    0    40G  0 lvm

>     >> >> >>> > ├─ssd0-db61    252:1    0    40G  0 lvm

>     >> >> >>> > ├─ssd0-db62    252:2    0    40G  0 lvm

>     >> >> >>> > ├─ssd0-db63    252:3    0    40G  0 lvm

>     >> >> >>> > ├─ssd0-db64    252:4    0    40G  0 lvm

>     >> >> >>> > ├─ssd0-db65    252:5    0    40G  0 lvm

>     >> >> >>> > ├─ssd0-db66    252:6    0    40G  0 lvm

>     >> >> >>> > ├─ssd0-db67    252:7    0    40G  0 lvm

>     >> >> >>> > ├─ssd0-db68    252:8    0    40G  0 lvm

>     >> >> >>> > └─ssd0-db69    252:9    0    40G  0 lvm

>     >> >> >>> > sdb              8:16   0 447.1G  0 disk

>     >> >> >>> > ├─sdb1           8:17   0    40G  0 part

>     >> >> >>> > ├─sdb2           8:18   0    40G  0 part

>     >> >> >>> >

>     >> >> >>> > .....

>     >> >> >>> >

>     >> >> >>> > sdh              8:112  0   3.7T  0 disk

>     >> >> >>> > └─hdd60-data60 252:10   0   3.7T  0 lvm

>     >> >> >>> > sdi              8:128  0   3.7T  0 disk

>     >> >> >>> > └─hdd61-data61 252:11   0   3.7T  0 lvm

>     >> >> >>> > sdj              8:144  0   3.7T  0 disk

>     >> >> >>> > └─hdd62-data62 252:12   0   3.7T  0 lvm

>     >> >> >>> > sdk              8:160  0   3.7T  0 disk

>     >> >> >>> > └─hdd63-data63 252:13   0   3.7T  0 lvm

>     >> >> >>> > sdl              8:176  0   3.7T  0 disk

>     >> >> >>> > └─hdd64-data64 252:14   0   3.7T  0 lvm

>     >> >> >>> > sdm              8:192  0   3.7T  0 disk

>     >> >> >>> > └─hdd65-data65 252:15   0   3.7T  0 lvm

>     >> >> >>> > sdn              8:208  0   3.7T  0 disk

>     >> >> >>> > └─hdd66-data66 252:16   0   3.7T  0 lvm

>     >> >> >>> > sdo              8:224  0   3.7T  0 disk

>     >> >> >>> > └─hdd67-data67 252:17   0   3.7T  0 lvm

>     >> >> >>> > sdp              8:240  0   3.7T  0 disk

>     >> >> >>> > └─hdd68-data68 252:18   0   3.7T  0 lvm

>     >> >> >>> > sdq             65:0    0   3.7T  0 disk

>     >> >> >>> > └─hdd69-data69 252:19   0   3.7T  0 lvm

>     >> >> >>> > sdr             65:16   0   3.7T  0 disk

>     >> >> >>> > └─sdr1          65:17   0   3.7T  0 part

>     /var/lib/ceph/osd/ceph-70

>     >> >> >>> > .....

>     >> >> >>> >

>     >> >> >>> > As a Ceph novice, I am totally clueless about the next

>     step at this point.  Any help would be appreciated.

>     >> >> >>> >

>     >> >> >>> > On Thu, Nov 1, 2018 at 3:16 PM, Hayashida, Mami

>     <mami.hayashida@xxxxxxx <mailto:mami.hayashida@xxxxxxx>> wrote:

>     >> >> >>> >>

>     >> >> >>> >> Thank you, both of you.  I will try this out very soon.

>     >> >> >>> >>

>     >> >> >>> >> On Wed, Oct 31, 2018 at 8:48 AM, Alfredo Deza

>     <adeza@xxxxxxxxxx <mailto:adeza@xxxxxxxxxx>> wrote:

>     >> >> >>> >>>

>     >> >> >>> >>> On Wed, Oct 31, 2018 at 8:28 AM Hayashida, Mami

>     <mami.hayashida@xxxxxxx <mailto:mami.hayashida@xxxxxxx>> wrote:

>     >> >> >>> >>> >

>     >> >> >>> >>> > Thank you for your replies. So, if I use the method

>     Hector suggested (by creating PVs, VGs.... etc. first), can I add

>     the --osd-id parameter to the command as in

>     >> >> >>> >>> >

>     >> >> >>> >>> > ceph-volume lvm prepare --bluestore --data

>     hdd0/data0 --block.db ssd/db0  --osd-id 0

>     >> >> >>> >>> > ceph-volume lvm prepare --bluestore --data

>     hdd1/data1 --block.db ssd/db1  --osd-id 1

>     >> >> >>> >>> >

>     >> >> >>> >>> > so that Filestore -> Bluestore migration will not

>     change the osd ID on each disk?

>     >> >> >>> >>>

>     >> >> >>> >>> That looks correct.

>     >> >> >>> >>>

>     >> >> >>> >>> >

>     >> >> >>> >>> > And one more question.  Are there any changes I need

>     to make to the ceph.conf file?  I did comment out this line that was

>     probably used for creating Filestore (using ceph-deploy):  osd

>     journal size = 40960

>     >> >> >>> >>>

>     >> >> >>> >>> Since you've pre-created the LVs the commented out

>     line will not

>     >> >> >>> >>> affect anything.

>     >> >> >>> >>>

>     >> >> >>> >>> >

>     >> >> >>> >>> >

>     >> >> >>> >>> >

>     >> >> >>> >>> > On Wed, Oct 31, 2018 at 7:03 AM, Alfredo Deza

>     <adeza@xxxxxxxxxx <mailto:adeza@xxxxxxxxxx>> wrote:

>     >> >> >>> >>> >>

>     >> >> >>> >>> >> On Wed, Oct 31, 2018 at 5:22 AM Hector Martin

>     <hector@xxxxxxxxxxxxxx <mailto:hector@xxxxxxxxxxxxxx>> wrote:

>     >> >> >>> >>> >> >

>     >> >> >>> >>> >> > On 31/10/2018 05:55, Hayashida, Mami wrote:

>     >> >> >>> >>> >> > > I am relatively new to Ceph and need some

>     advice on Bluestore migration.

>     >> >> >>> >>> >> > > I tried migrating a few of our test cluster

>     nodes from Filestore to

>     >> >> >>> >>> >> > > Bluestore by following this

>     >> >> >>> >>> >> > >

>     (http://docs.ceph.com/docs/luminous/rados/operations/bluestore-migration/

>     <http://docs.ceph.com/docs/luminous/rados/operations/bluestore-migration/>)

>     >> >> >>> >>> >> > > as the cluster is currently running 12.2.9. The

>     cluster, originally set

>     >> >> >>> >>> >> > > up by my predecessors, was running Jewel until

>     I upgraded it recently to

>     >> >> >>> >>> >> > > Luminous.

>     >> >> >>> >>> >> > >

>     >> >> >>> >>> >> > > OSDs in each OSD host is set up in such a way

>     that for ever 10 data HDD

>     >> >> >>> >>> >> > > disks, there is one SSD drive that is holding

>     their journals.  For

>     >> >> >>> >>> >> > > example, osd.0 data is on /dev/sdh and its

>     Filestore journal is on a

>     >> >> >>> >>> >> > > partitioned part of /dev/sda. So, lsblk shows

>     something like

>     >> >> >>> >>> >> > >

>     >> >> >>> >>> >> > > sda       8:0    0 447.1G  0 disk

>     >> >> >>> >>> >> > > ├─sda1    8:1    0    40G  0 part # journal for

>     osd.0

>     >> >> >>> >>> >> > >

>     >> >> >>> >>> >> > > sdh       8:112  0   3.7T  0 disk

>     >> >> >>> >>> >> > > └─sdh1    8:113  0   3.7T  0 part

>     /var/lib/ceph/osd/ceph-0

>     >> >> >>> >>> >> > >

>     >> >> >>> >>> >> >

>     >> >> >>> >>> >> > The BlueStore documentation states that the wal

>     will automatically use

>     >> >> >>> >>> >> > the db volume if it fits, so if you're using a

>     single SSD I think

>     >> >> >>> >>> >> > there's no good reason to split out the wal, if

>     I'm understanding it

>     >> >> >>> >>> >> > correctly.

>     >> >> >>> >>> >>

>     >> >> >>> >>> >> This is correct, no need for wal in this case.

>     >> >> >>> >>> >>

>     >> >> >>> >>> >> >

>     >> >> >>> >>> >> > You should be using ceph-volume, since ceph-disk

>     is deprecated. If

>     >> >> >>> >>> >> > you're sharing the SSD as wal/db for a bunch of

>     OSDs, I think you're

>     >> >> >>> >>> >> > going to have to create the LVs yourself first.

>     The data HDDs should be

>     >> >> >>> >>> >> > PVs (I don't think it matters if they're

>     partitions or whole disk PVs as

>     >> >> >>> >>> >> > long as LVM discovers them) each part of a

>     separate VG (e.g. hdd0-hdd9)

>     >> >> >>> >>> >> > containing a single LV. Then the SSD should

>     itself be an LV for a

>     >> >> >>> >>> >> > separate shared SSD VG (e.g. ssd).

>     >> >> >>> >>> >> >

>     >> >> >>> >>> >> > So something like (assuming sda is your wal SSD

>     and sdb and onwards are

>     >> >> >>> >>> >> > your OSD HDDs):

>     >> >> >>> >>> >> > pvcreate /dev/sda

>     >> >> >>> >>> >> > pvcreate /dev/sdb

>     >> >> >>> >>> >> > pvcreate /dev/sdc

>     >> >> >>> >>> >> > ...

>     >> >> >>> >>> >> >

>     >> >> >>> >>> >> > vgcreate ssd /dev/sda

>     >> >> >>> >>> >> > vgcreate hdd0 /dev/sdb

>     >> >> >>> >>> >> > vgcreate hdd1 /dev/sdc

>     >> >> >>> >>> >> > ...

>     >> >> >>> >>> >> >

>     >> >> >>> >>> >> > lvcreate -L 40G -n db0 ssd

>     >> >> >>> >>> >> > lvcreate -L 40G -n db1 ssd

>     >> >> >>> >>> >> > ...

>     >> >> >>> >>> >> >

>     >> >> >>> >>> >> > lvcreate -L 100%VG -n data0 hdd0

>     >> >> >>> >>> >> > lvcreate -L 100%VG -n data1 hdd1

>     >> >> >>> >>> >> > ...

>     >> >> >>> >>> >> >

>     >> >> >>> >>> >> > ceph-volume lvm prepare --bluestore --data

>     hdd0/data0 --block.db ssd/db0

>     >> >> >>> >>> >> > ceph-volume lvm prepare --bluestore --data

>     hdd1/data1 --block.db ssd/db1

>     >> >> >>> >>> >> > ...

>     >> >> >>> >>> >> >

>     >> >> >>> >>> >> > ceph-volume lvm activate --all

>     >> >> >>> >>> >> >

>     >> >> >>> >>> >> > I think it might be possible to just let

>     ceph-volume create the PV/VG/LV

>     >> >> >>> >>> >> > for the data disks and only manually create the

>     DB LVs, but it shouldn't

>     >> >> >>> >>> >> > hurt to do it on your own and just give

>     ready-made LVs to ceph-volume

>     >> >> >>> >>> >> > for everything.

>     >> >> >>> >>> >>

>     >> >> >>> >>> >> Another alternative here is to use the new `lvm

>     batch` subcommand to

>     >> >> >>> >>> >> do all of this in one go:

>     >> >> >>> >>> >>

>     >> >> >>> >>> >> ceph-volume lvm batch /dev/sda /dev/sdb /dev/sdc

>     /dev/sdd /dev/sde

>     >> >> >>> >>> >> /dev/sdf /dev/sdg /dev/sdh

>     >> >> >>> >>> >>

>     >> >> >>> >>> >> Will detect that sda is an SSD and will create the

>     LVs for you for

>     >> >> >>> >>> >> block.db (one for each spinning disk). For each

>     spinning disk, it will

>     >> >> >>> >>> >> place data on them.

>     >> >> >>> >>> >>

>     >> >> >>> >>> >> The one caveat is that you no longer control OSD

>     IDs, and they are

>     >> >> >>> >>> >> created with whatever the monitors are giving out.

>     >> >> >>> >>> >>

>     >> >> >>> >>> >> This operation is not supported from ceph-deploy

>     either.

>     >> >> >>> >>> >> >

>     >> >> >>> >>> >> > --

>     >> >> >>> >>> >> > Hector Martin (hector@xxxxxxxxxxxxxx

>     <mailto:hector@xxxxxxxxxxxxxx>)

>     >> >> >>> >>> >> > Public Key: https://marcan.st/marcan.asc

>     >> >> >>> >>> >> > _______________________________________________

>     >> >> >>> >>> >> > ceph-users mailing list

>     >> >> >>> >>> >> > ceph-users@xxxxxxxxxxxxxx

>     <mailto:ceph-users@lists.ceph.com>

>     >> >> >>> >>> >> >

>     http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

>     <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>

>     >> >> >>> >>> >

>     >> >> >>> >>> >

>     >> >> >>> >>> >

>     >> >> >>> >>> >

>     >> >> >>> >>> > --

>     >> >> >>> >>> > Mami Hayashida

>     >> >> >>> >>> > Research Computing Associate

>     >> >> >>> >>> >

>     >> >> >>> >>> > Research Computing Infrastructure

>     >> >> >>> >>> > University of Kentucky Information Technology Services

>     >> >> >>> >>> > 301 Rose Street | 102 James F. Hardymon Building

>     >> >> >>> >>> > Lexington, KY 40506-0495

>     >> >> >>> >>> > mami.hayashida@xxxxxxx <mailto:mami.hayashida@xxxxxxx>

>     >> >> >>> >>> > (859)323-7521

>     >> >> >>> >>

>     >> >> >>> >>

>     >> >> >>> >>

>     >> >> >>> >>

>     >> >> >>> >> --

>     >> >> >>> >> Mami Hayashida

>     >> >> >>> >> Research Computing Associate

>     >> >> >>> >>

>     >> >> >>> >> Research Computing Infrastructure

>     >> >> >>> >> University of Kentucky Information Technology Services

>     >> >> >>> >> 301 Rose Street | 102 James F. Hardymon Building

>     >> >> >>> >> Lexington, KY 40506-0495

>     >> >> >>> >> mami.hayashida@xxxxxxx <mailto:mami.hayashida@xxxxxxx>

>     >> >> >>> >> (859)323-7521

>     >> >> >>> >

>     >> >> >>> >

>     >> >> >>> >

>     >> >> >>> >

>     >> >> >>> > --

>     >> >> >>> > Mami Hayashida

>     >> >> >>> > Research Computing Associate

>     >> >> >>> >

>     >> >> >>> > Research Computing Infrastructure

>     >> >> >>> > University of Kentucky Information Technology Services

>     >> >> >>> > 301 Rose Street | 102 James F. Hardymon Building

>     >> >> >>> > Lexington, KY 40506-0495

>     >> >> >>> > mami.hayashida@xxxxxxx <mailto:mami.hayashida@xxxxxxx>

>     >> >> >>> > (859)323-7521

>     >> >> >>

>     >> >> >>

>     >> >> >>

>     >> >> >>

>     >> >> >> --

>     >> >> >> Mami Hayashida

>     >> >> >> Research Computing Associate

>     >> >> >>

>     >> >> >> Research Computing Infrastructure

>     >> >> >> University of Kentucky Information Technology Services

>     >> >> >> 301 Rose Street | 102 James F. Hardymon Building

>     >> >> >> Lexington, KY 40506-0495

>     >> >> >> mami.hayashida@xxxxxxx <mailto:mami.hayashida@xxxxxxx>

>     >> >> >> (859)323-7521

>     >> >> >

>     >> >> >

>     >> >> >

>     >> >> >

>     >> >> > --

>     >> >> > Mami Hayashida

>     >> >> > Research Computing Associate

>     >> >> >

>     >> >> > Research Computing Infrastructure

>     >> >> > University of Kentucky Information Technology Services

>     >> >> > 301 Rose Street | 102 James F. Hardymon Building

>     >> >> > Lexington, KY 40506-0495

>     >> >> > mami.hayashida@xxxxxxx <mailto:mami.hayashida@xxxxxxx>

>     >> >> > (859)323-7521

>     >> >

>     >> >

>     >> >

>     >> >

>     >> > --

>     >> > Mami Hayashida

>     >> > Research Computing Associate

>     >> >

>     >> > Research Computing Infrastructure

>     >> > University of Kentucky Information Technology Services

>     >> > 301 Rose Street | 102 James F. Hardymon Building

>     >> > Lexington, KY 40506-0495

>     >> > mami.hayashida@xxxxxxx <mailto:mami.hayashida@xxxxxxx>

>     >> > (859)323-7521

>     >

>     >

>     >

>     >

>     > --

>     > Mami Hayashida

>     > Research Computing Associate

>     >

>     > Research Computing Infrastructure

>     > University of Kentucky Information Technology Services

>     > 301 Rose Street | 102 James F. Hardymon Building

>     > Lexington, KY 40506-0495

>     > mami.hayashida@xxxxxxx <mailto:mami.hayashida@xxxxxxx>

>     > (859)323-7521

> 

> 

> 

> 

> -- 

> *Mami Hayashida*

> /Research Computing Associate

> /

> Research Computing Infrastructure

> University of Kentucky Information Technology Services

> 301 Rose Street | 102 James F. Hardymon Building

> Lexington, KY 40506-0495

> mami.hayashida@xxxxxxx <mailto:mami.hayashida@xxxxxxx>

> (859)323-7521

-- 

Hector Martin (hector@xxxxxxxxxxxxxx)

Public Key: https://mrcn.st/pub

-- 
Mami Hayashida
Research Computing Associate

Research Computing Infrastructure
University of Kentucky Information Technology Services 
301 Rose Street | 102 James F. Hardymon Building
Lexington, KY 40506-0495
mami.hayashida@xxxxxxx
(859)323-7521

-- 
Mami Hayashida
Research Computing Associate

Research Computing Infrastructure
University of Kentucky Information Technology Services 
301 Rose Street | 102 James F. Hardymon Building
Lexington, KY 40506-0495
mami.hayashida@xxxxxxx
(859)323-7521

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com