Re: Filestore to Bluestore migration question

"Hayashida, Mami" <mami.hayashida@xxxxxxx> · Fri, 2 Nov 2018 17:03:27 -0400

I followed all the steps Hector suggested, and almost everything seems to have worked fine.  I say "almost" because one out of the 10 osds I was migrating could not be activated even though everything up to that point worked just as well for that osd as the other ones. Here is the output for that particular failure:

*****
ceph-volume lvm activate --all
...
--> Activating OSD ID 67 FSID 17cd6755-76f9-4160-906c-XXXXXX
Running command: mount -t tmpfs tmpfs /var/lib/ceph/osd/ceph-67
--> Absolute path not found for executable: restorecon
--> Ensure $PATH environment variable contains common executable locations
Running command: ceph-bluestore-tool --cluster=ceph prime-osd-dir --dev /dev/hdd67/data67 --path /var/lib/ceph/osd/ceph-67
 stderr: failed to read label for /dev/hdd67/data67: (2) No such file or directory
-->  RuntimeError: command returned non-zero exit status:

*******
I then checked to see if the rest of the migrated OSDs were back in by calling the ceph osd tree command from the admin node.  Since they were not, I tried to restart the first of the 10 newly migrated Bluestore osds by calling

*******
systemctl start ceph-osd@60

At that point, not only this particular service could not be started, but ALL the OSDs (daemons) on the entire node shut down!!!!!  

******
root@osd1:~# systemctl status ceph-osd@60
● ceph-osd@60.service - Ceph object storage daemon osd.60
   Loaded: loaded (/lib/systemd/system/ceph-osd@.service; enabled-runtime; vendor preset: enabled)
   Active: inactive (dead) since Fri 2018-11-02 15:47:20 EDT; 1h 9min ago
  Process: 3473621 ExecStart=/usr/bin/ceph-osd -f --cluster ${CLUSTER} --id %i --setuser ceph --setgroup ceph (code=exited, status=0/SUCCESS)
  Process: 3473147 ExecStartPre=/usr/lib/ceph/ceph-osd-prestart.sh --cluster ${CLUSTER} --id %i (code=exited, status=0/SUCCESS)
 Main PID: 3473621 (code=exited, status=0/SUCCESS)

Oct 29 15:57:53 osd1.xxxxx.uky.edu ceph-osd[3473621]: 2018-10-29 15:57:53.868856 7f68adaece00 -1 osd.60 48106 log_to_monitors {default=true}
Oct 29 15:57:53 osd1.xxxxx.uky.edu ceph-osd[3473621]: 2018-10-29 15:57:53.874373 7f68adaece00 -1 osd.60 48106 mon_cmd_maybe_osd_create fail: 'you must complete the upgrade and 'ceph osd require-osd-release luminous' before using crush device classes': (1) Operation not permitted
Oct 30 06:25:01 osd1.xxxxx.uky.edu ceph-osd[3473621]: 2018-10-30 06:25:01.961720 7f687feb3700 -1 received  signal: Hangup from  PID: 3485955 task name: killall -q -1 ceph-mon ceph-mgr ceph-mds ceph-osd ceph-fuse radosgw  UID: 0
Oct 31 06:25:02 osd1.xxxxx.uky.edu ceph-osd[3473621]: 2018-10-31 06:25:02.110898 7f687feb3700 -1 received  signal: Hangup from  PID: 3500945 task name: killall -q -1 ceph-mon ceph-mgr ceph-mds ceph-osd ceph-fuse radosgw  UID: 0
Nov 01 06:25:02 osd1.xxxxx.uky.edu ceph-osd[3473621]: 2018-11-01 06:25:02.101548 7f687feb3700 -1 received  signal: Hangup from  PID: 3514774 task name: killall -q -1 ceph-mon ceph-mgr ceph-mds ceph-osd ceph-fuse radosgw  UID: 0
Nov 02 06:25:02 osd1.xxxxx.uky.edu ceph-osd[3473621]: 2018-11-02 06:25:01.997557 7f687feb3700 -1 received  signal: Hangup from  PID: 3528128 task name: killall -q -1 ceph-mon ceph-mgr ceph-mds ceph-osd ceph-fuse radosgw  UID: 0
Nov 02 15:47:16 osd1.oxxxxx.uky.edu ceph-osd[3473621]: 2018-11-02 15:47:16.322229 7f687feb3700 -1 received  signal: Terminated from  PID: 1 task name: /lib/systemd/systemd --system --deserialize 20  UID: 0
Nov 02 15:47:16 osd1.xxxxx.uky.edu ceph-osd[3473621]: 2018-11-02 15:47:16.322253 7f687feb3700 -1 osd.60 48504 *** Got signal Terminated ***
Nov 02 15:47:16 osd1.xxxxx.uky.edu ceph-osd[3473621]: 2018-11-02 15:47:16.676625 7f687feb3700 -1 osd.60 48504 shutdown
Nov 02 16:34:05 osd1.oxxxxx.uky.edu systemd[1]: Stopped Ceph object storage daemon osd.60.

********** 
And ere is the output for one of the OSDs (osd.70 still using Filestore) that shut down right when I tried to start osd.60 

********

root@osd1:~# systemctl status ceph-osd@70
● ceph-osd@70.service - Ceph object storage daemon osd.70
   Loaded: loaded (/lib/systemd/system/ceph-osd@.service; enabled-runtime; vendor preset: enabled)
   Active: inactive (dead) since Fri 2018-11-02 16:34:08 EDT; 2min 6s ago
  Process: 3473629 ExecStart=/usr/bin/ceph-osd -f --cluster ${CLUSTER} --id %i --setuser ceph --setgroup ceph (code=exited, status=0/SUCCESS)
  Process: 3473153 ExecStartPre=/usr/lib/ceph/ceph-osd-prestart.sh --cluster ${CLUSTER} --id %i (code=exited, status=0/SUCCESS)
 Main PID: 3473629 (code=exited, status=0/SUCCESS)

Oct 29 15:57:51 osd1.xxxx.uky.edu ceph-osd[3473629]: 2018-10-29 15:57:51.300563 7f530eec2e00 -1 osd.70 pg_epoch: 48095 pg[68.ces1( empty local-lis/les=47489/47489 n=0 ec=6030/6030 lis/c 47488/47488 les/c/f 47489/47489/0 47485/47488/47488) [138,70,203]p138(0) r=1 lpr=0 crt=0'0 unknown NO
Oct 30 06:25:01 osd1.xxxx.uky.edu ceph-osd[3473629]: 2018-10-30 06:25:01.961743 7f52d8e44700 -1 received  signal: Hangup from  PID: 3485955 task name: killall -q -1 ceph-mon ceph-mgr ceph-mds ceph-osd ceph-fuse radosgw  UID: 0
Oct 31 06:25:02 osd1.xxxx.uky.edu ceph-osd[3473629]: 2018-10-31 06:25:02.110920 7f52d8e44700 -1 received  signal: Hangup from  PID: 3500945 task name: killall -q -1 ceph-mon ceph-mgr ceph-mds ceph-osd ceph-fuse radosgw  UID: 0
Nov 01 06:25:02 osd1.xxxx.uky.edu ceph-osd[3473629]: 2018-11-01 06:25:02.101568 7f52d8e44700 -1 received  signal: Hangup from  PID: 3514774 task name: killall -q -1 ceph-mon ceph-mgr ceph-mds ceph-osd ceph-fuse radosgw  UID: 0
Nov 02 06:25:02 osd1.xxxx.uky.edu ceph-osd[3473629]: 2018-11-02 06:25:01.997633 7f52d8e44700 -1 received  signal: Hangup from  PID: 3528128 task name: killall -q -1 ceph-mon ceph-mgr ceph-mds ceph-osd ceph-fuse radosgw  UID: 0
Nov 02 16:34:05 osd1.xxxx.uky.edu ceph-osd[3473629]: 2018-11-02 16:34:05.607714 7f52d8e44700 -1 received  signal: Terminated from  PID: 1 task name: /lib/systemd/systemd --system --deserialize 20  UID: 0
Nov 02 16:34:05 osd1.xxxx.uky.edu ceph-osd[3473629]: 2018-11-02 16:34:05.607738 7f52d8e44700 -1 osd.70 48535 *** Got signal Terminated ***
Nov 02 16:34:05 osd1.xxxx.uky.edu systemd[1]: Stopping Ceph object storage daemon osd.70...
Nov 02 16:34:05 osd1.xxxx.uky.edu ceph-osd[3473629]: 2018-11-02 16:34:05.677348 7f52d8e44700 -1 osd.70 48535 shutdown
Nov 02 16:34:08 osd1.xxxx.uky.edu systemd[1]: Stopped Ceph object storage daemon osd.70.

**************

So, at this point, ALL the OSDs on that node have been shut down.

For your information this is the output of lsblk command (selection)
*****
root@osd1:~# lsblk
NAME           MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda              8:0    0 447.1G  0 disk 
├─ssd0-db60    252:0    0    40G  0 lvm  
├─ssd0-db61    252:1    0    40G  0 lvm  
├─ssd0-db62    252:2    0    40G  0 lvm  
├─ssd0-db63    252:3    0    40G  0 lvm  
├─ssd0-db64    252:4    0    40G  0 lvm  
├─ssd0-db65    252:5    0    40G  0 lvm  
├─ssd0-db66    252:6    0    40G  0 lvm  
├─ssd0-db67    252:7    0    40G  0 lvm  
├─ssd0-db68    252:8    0    40G  0 lvm  
└─ssd0-db69    252:9    0    40G  0 lvm  
sdb              8:16   0 447.1G  0 disk 
├─sdb1           8:17   0    40G  0 part 
├─sdb2           8:18   0    40G  0 part 

.....

sdh              8:112  0   3.7T  0 disk 
└─hdd60-data60 252:10   0   3.7T  0 lvm  
sdi              8:128  0   3.7T  0 disk 
└─hdd61-data61 252:11   0   3.7T  0 lvm  
sdj              8:144  0   3.7T  0 disk 
└─hdd62-data62 252:12   0   3.7T  0 lvm  
sdk              8:160  0   3.7T  0 disk 
└─hdd63-data63 252:13   0   3.7T  0 lvm  
sdl              8:176  0   3.7T  0 disk 
└─hdd64-data64 252:14   0   3.7T  0 lvm  
sdm              8:192  0   3.7T  0 disk 
└─hdd65-data65 252:15   0   3.7T  0 lvm  
sdn              8:208  0   3.7T  0 disk 
└─hdd66-data66 252:16   0   3.7T  0 lvm  
sdo              8:224  0   3.7T  0 disk 
└─hdd67-data67 252:17   0   3.7T  0 lvm  
sdp              8:240  0   3.7T  0 disk 
└─hdd68-data68 252:18   0   3.7T  0 lvm  
sdq             65:0    0   3.7T  0 disk 
└─hdd69-data69 252:19   0   3.7T  0 lvm  
sdr             65:16   0   3.7T  0 disk 
└─sdr1          65:17   0   3.7T  0 part /var/lib/ceph/osd/ceph-70
.....

As a Ceph novice, I am totally clueless about the next step at this point.  Any help would be appreciated.

On Thu, Nov 1, 2018 at 3:16 PM, Hayashida, Mami <mami.hayashida@xxxxxxx> wrote:
Thank you, both of you.  I will try this out very soon.  

On Wed, Oct 31, 2018 at 8:48 AM, Alfredo Deza <adeza@xxxxxxxxxx> wrote:
On Wed, Oct 31, 2018 at 8:28 AM Hayashida, Mami <mami.hayashida@xxxxxxx> wrote:

>

> Thank you for your replies. So, if I use the method Hector suggested (by creating PVs, VGs.... etc. first), can I add the --osd-id parameter to the command as in

>

> ceph-volume lvm prepare --bluestore --data hdd0/data0 --block.db ssd/db0  --osd-id 0

> ceph-volume lvm prepare --bluestore --data hdd1/data1 --block.db ssd/db1  --osd-id 1

>

> so that Filestore -> Bluestore migration will not change the osd ID on each disk?

That looks correct.

>

> And one more question.  Are there any changes I need to make to the ceph.conf file?  I did comment out this line that was probably used for creating Filestore (using ceph-deploy):  osd journal size = 40960

Since you've pre-created the LVs the commented out line will not

affect anything.

>

>

>

> On Wed, Oct 31, 2018 at 7:03 AM, Alfredo Deza <adeza@xxxxxxxxxx> wrote:

>>

>> On Wed, Oct 31, 2018 at 5:22 AM Hector Martin <hector@xxxxxxxxxxxxxx> wrote:

>> >

>> > On 31/10/2018 05:55, Hayashida, Mami wrote:

>> > > I am relatively new to Ceph and need some advice on Bluestore migration.

>> > > I tried migrating a few of our test cluster nodes from Filestore to

>> > > Bluestore by following this

>> > > (http://docs.ceph.com/docs/luminous/rados/operations/bluestore-migration/)

>> > > as the cluster is currently running 12.2.9. The cluster, originally set

>> > > up by my predecessors, was running Jewel until I upgraded it recently to

>> > > Luminous.

>> > >

>> > > OSDs in each OSD host is set up in such a way that for ever 10 data HDD

>> > > disks, there is one SSD drive that is holding their journals.  For

>> > > example, osd.0 data is on /dev/sdh and its Filestore journal is on a

>> > > partitioned part of /dev/sda. So, lsblk shows something like

>> > >

>> > > sda       8:0    0 447.1G  0 disk

>> > > ├─sda1    8:1    0    40G  0 part # journal for osd.0

>> > >

>> > > sdh       8:112  0   3.7T  0 disk

>> > > └─sdh1    8:113  0   3.7T  0 part /var/lib/ceph/osd/ceph-0

>> > >

>> >

>> > The BlueStore documentation states that the wal will automatically use

>> > the db volume if it fits, so if you're using a single SSD I think

>> > there's no good reason to split out the wal, if I'm understanding it

>> > correctly.

>>

>> This is correct, no need for wal in this case.

>>

>> >

>> > You should be using ceph-volume, since ceph-disk is deprecated. If

>> > you're sharing the SSD as wal/db for a bunch of OSDs, I think you're

>> > going to have to create the LVs yourself first. The data HDDs should be

>> > PVs (I don't think it matters if they're partitions or whole disk PVs as

>> > long as LVM discovers them) each part of a separate VG (e.g. hdd0-hdd9)

>> > containing a single LV. Then the SSD should itself be an LV for a

>> > separate shared SSD VG (e.g. ssd).

>> >

>> > So something like (assuming sda is your wal SSD and sdb and onwards are

>> > your OSD HDDs):

>> > pvcreate /dev/sda

>> > pvcreate /dev/sdb

>> > pvcreate /dev/sdc

>> > ...

>> >

>> > vgcreate ssd /dev/sda

>> > vgcreate hdd0 /dev/sdb

>> > vgcreate hdd1 /dev/sdc

>> > ...

>> >

>> > lvcreate -L 40G -n db0 ssd

>> > lvcreate -L 40G -n db1 ssd

>> > ...

>> >

>> > lvcreate -L 100%VG -n data0 hdd0

>> > lvcreate -L 100%VG -n data1 hdd1

>> > ...

>> >

>> > ceph-volume lvm prepare --bluestore --data hdd0/data0 --block.db ssd/db0

>> > ceph-volume lvm prepare --bluestore --data hdd1/data1 --block.db ssd/db1

>> > ...

>> >

>> > ceph-volume lvm activate --all

>> >

>> > I think it might be possible to just let ceph-volume create the PV/VG/LV

>> > for the data disks and only manually create the DB LVs, but it shouldn't

>> > hurt to do it on your own and just give ready-made LVs to ceph-volume

>> > for everything.

>>

>> Another alternative here is to use the new `lvm batch` subcommand to

>> do all of this in one go:

>>

>> ceph-volume lvm batch /dev/sda /dev/sdb /dev/sdc /dev/sdd /dev/sde

>> /dev/sdf /dev/sdg /dev/sdh

>>

>> Will detect that sda is an SSD and will create the LVs for you for

>> block.db (one for each spinning disk). For each spinning disk, it will

>> place data on them.

>>

>> The one caveat is that you no longer control OSD IDs, and they are

>> created with whatever the monitors are giving out.

>>

>> This operation is not supported from ceph-deploy either.

>> >

>> > --

>> > Hector Martin (hector@xxxxxxxxxxxxxx)

>> > Public Key: https://marcan.st/marcan.asc

>> > _______________________________________________

>> > ceph-users mailing list

>> > ceph-users@xxxxxxxxxxxxxx

>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

>

>

>

>

> --

> Mami Hayashida

> Research Computing Associate

>

> Research Computing Infrastructure

> University of Kentucky Information Technology Services

> 301 Rose Street | 102 James F. Hardymon Building

> Lexington, KY 40506-0495

> mami.hayashida@xxxxxxx

> (859)323-7521

-- 
Mami Hayashida
Research Computing Associate

Research Computing Infrastructure
University of Kentucky Information Technology Services 
301 Rose Street | 102 James F. Hardymon Building
Lexington, KY 40506-0495
mami.hayashida@xxxxxxx
(859)323-7521

-- 
Mami Hayashida
Research Computing Associate

Research Computing Infrastructure
University of Kentucky Information Technology Services 
301 Rose Street | 102 James F. Hardymon Building
Lexington, KY 40506-0495
mami.hayashida@xxxxxxx
(859)323-7521

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com