Re: Filestore to Bluestore migration question

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



WOW.  With you two guiding me through every step, the 10 OSDs in question are now added back to the cluster as Bluestore disks!!!  Here are my responses to the last email from Hector:

1. I first checked the permissions and they looked like this

root@osd1:/var/lib/ceph/osd/ceph-60# ls -l
total 56
-rw-r--r-- 1 ceph ceph         384 Nov  2 16:20 activate.monmap
-rw-r--r-- 1 ceph ceph 10737418240 Nov  2 16:20 block
lrwxrwxrwx 1 ceph ceph          14 Nov  2 16:20 block.db -> /dev/ssd0/db60

root@osd1:~# ls -l /dev/ssd0/
...
lrwxrwxrwx 1 root root 7 Nov  5 12:38 db60 -> ../dm-2

root@osd1:~# ls -la /dev/
...
brw-rw----  1 root disk    252,   2 Nov  5 12:38 dm-2
...

2. I then ran ceph-volume activate --all again.  Saw the same error for osd.67 I described many emails ago..  None of the permissions changed.  I tried restarting ceph-osd@60, but got the same error as before:

2018-11-05 15:34:52.001782 7f5a15744e00  0 set uid:gid to 64045:64045 (ceph:ceph)
2018-11-05 15:34:52.001808 7f5a15744e00  0 ceph version 12.2.9 (9e300932ef8a8916fb3fda78c58691a6ab0f4217) luminous (stable), process ceph-osd, pid 36506
2018-11-05 15:34:52.021717 7f5a15744e00  0 pidfile_write: ignore empty --pid-file
2018-11-05 15:34:52.033478 7f5a15744e00  0 load: jerasure load: lrc load: isa 
2018-11-05 15:34:52.033557 7f5a15744e00  1 bdev create path /var/lib/ceph/osd/ceph-60/block type kernel
2018-11-05 15:34:52.033572 7f5a15744e00  1 bdev(0x5651bd1b8d80 /var/lib/ceph/osd/ceph-60/block) open path /var/lib/ceph/osd/ceph-60/block
2018-11-05 15:34:52.033888 7f5a15744e00  1 bdev(0x5651bd1b8d80 /var/lib/ceph/osd/ceph-60/block) open size 10737418240 (0x280000000, 10GiB) block_size 4096 (4KiB) rotational
2018-11-05 15:34:52.033958 7f5a15744e00  1 bluestore(/var/lib/ceph/osd/ceph-60) _set_cache_sizes cache_size 1073741824 meta 0.4 kv 0.4 data 0.2
2018-11-05 15:34:52.033984 7f5a15744e00  1 bdev(0x5651bd1b8d80 /var/lib/ceph/osd/ceph-60/block) close
2018-11-05 15:34:52.318993 7f5a15744e00  1 bluestore(/var/lib/ceph/osd/ceph-60) _mount path /var/lib/ceph/osd/ceph-60
2018-11-05 15:34:52.319064 7f5a15744e00  1 bdev create path /var/lib/ceph/osd/ceph-60/block type kernel
2018-11-05 15:34:52.319073 7f5a15744e00  1 bdev(0x5651bd1b8fc0 /var/lib/ceph/osd/ceph-60/block) open path /var/lib/ceph/osd/ceph-60/block
2018-11-05 15:34:52.319356 7f5a15744e00  1 bdev(0x5651bd1b8fc0 /var/lib/ceph/osd/ceph-60/block) open size 10737418240 (0x280000000, 10GiB) block_size 4096 (4KiB) rotational
2018-11-05 15:34:52.319415 7f5a15744e00  1 bluestore(/var/lib/ceph/osd/ceph-60) _set_cache_sizes cache_size 1073741824 meta 0.4 kv 0.4 data 0.2
2018-11-05 15:34:52.319491 7f5a15744e00  1 bdev create path /var/lib/ceph/osd/ceph-60/block.db type kernel
2018-11-05 15:34:52.319499 7f5a15744e00  1 bdev(0x5651bd1b9200 /var/lib/ceph/osd/ceph-60/block.db) open path /var/lib/ceph/osd/ceph-60/block.db
2018-11-05 15:34:52.319514 7f5a15744e00 -1 bdev(0x5651bd1b9200 /var/lib/ceph/osd/ceph-60/block.db) open open got: (13) Permission denied
2018-11-05 15:34:52.319648 7f5a15744e00 -1 bluestore(/var/lib/ceph/osd/ceph-60) _open_db add block device(/var/lib/ceph/osd/ceph-60/block.db) returned: (13) Permission denied
2018-11-05 15:34:52.319666 7f5a15744e00  1 bdev(0x5651bd1b8fc0 /var/lib/ceph/osd/ceph-60/block) close
2018-11-05 15:34:52.598249 7f5a15744e00 -1 osd.60 0 OSD:init: unable to mount object store
2018-11-05 15:34:52.598269 7f5a15744e00 -1  ** ERROR: osd init failed: (13) Permission denied

3. Finally, I literally copied and pasted the udev rule Hector wrote out for me, then rebooted the server.  

4. I tried restarting ceph-osd@60 -- this time it came right up!!!  I was able to start all the rest, including ceph-osd@67 which I thought did not get activated by lvm.   

5. I checked from the admin node and verified osd.60-69 are all in the cluster as Bluestore OSDs and they indeed are.  

********************
Thank you SO MUCH, both of you, for putting up with my novice questions all the way.  I am planning to convert the rest of the cluster the same way by reviewing this entire thread to trace what steps need to be taken. 

Mami

On Mon, Nov 5, 2018 at 3:00 PM, Hector Martin <hector@xxxxxxxxxxxxxx> wrote:


On 11/6/18 3:31 AM, Hayashida, Mami wrote:
> 2018-11-05 12:47:01.075573 7f1f2775ae00 -1 bluestore(/var/lib/ceph/osd/ceph-60) _open_db add block device(/var/lib/ceph/osd/ceph-60/block.db) returned: (13) Permission denied

Looks like the permissions on the block.db device are wrong. As far as I
know ceph-volume is responsible for setting this at activation time.

> I already ran the "ceph-volume lvm activate --all "  command right after
> I prepared (using "lvm prepare") those OSDs.  Do I need to run the
> "activate" command again?

The activation is required on every boot to create the
/var/lib/ceph/osd/* directory, but that should be automatically done by
systemd units (since you didn't run it after the reboot and yet the
directories exist, it seems to have worked).

Can you ls -l the OSD directory (/var/lib/ceph/osd/ceph-60/) and also
any devices symlinked to from there, to see the permissions?

Then run the activate command again and list the permissions again to
see if they have changed, and if they have, try to start the OSD again.

I found one Ubuntu bug that suggests there may be a race condition:

https://bugs.launchpad.net/bugs/1767087

I get the feeling the ceph-osd activation may be happening before the
block.db device is ready, so when it gets created by LVM it's already
too late and doesn't have the right permissions. You could fix it with a
udev rule (like Ubuntu did) but if this is indeed your issue then it
sounds like something that should be fixed in Ceph. Perhaps all you need
is a systemd unit override to make sure ceph-volume@* services only
start after LVM is ready.

A usable udev rule could look like this (e.g. put it in
/etc/udev/rules.d/90-lvm-permisions.rules):

ACTION="" SUBSYSTEM=="block", ENV{DEVTYPE}=="disk", \
ENV{DM_LV_NAME}=="db*", ENV{DM_VG_NAME}=="ssd0", \
OWNER="ceph", GROUP="ceph", MODE="660"

Reboot after that and see if the OSDs come up without further action.

--
Hector Martin (hector@xxxxxxxxxxxxxx)
Public Key: https://mrcn.st/pub



--
Mami Hayashida
Research Computing Associate

Research Computing Infrastructure
University of Kentucky Information Technology Services
301 Rose Street | 102 James F. Hardymon Building
Lexington, KY 40506-0495
mami.hayashida@xxxxxxx
(859)323-7521
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux