Re: Filestore to Bluestore migration question

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Nov 5, 2018 at 4:04 PM Hayashida, Mami <mami.hayashida@xxxxxxx> wrote:
>
> WOW.  With you two guiding me through every step, the 10 OSDs in question are now added back to the cluster as Bluestore disks!!!  Here are my responses to the last email from Hector:
>
> 1. I first checked the permissions and they looked like this
>
> root@osd1:/var/lib/ceph/osd/ceph-60# ls -l
> total 56
> -rw-r--r-- 1 ceph ceph         384 Nov  2 16:20 activate.monmap
> -rw-r--r-- 1 ceph ceph 10737418240 Nov  2 16:20 block
> lrwxrwxrwx 1 ceph ceph          14 Nov  2 16:20 block.db -> /dev/ssd0/db60
>
> root@osd1:~# ls -l /dev/ssd0/
> ...
> lrwxrwxrwx 1 root root 7 Nov  5 12:38 db60 -> ../dm-2
>
> root@osd1:~# ls -la /dev/
> ...
> brw-rw----  1 root disk    252,   2 Nov  5 12:38 dm-2

This looks like a bug. You mentioned you are running 12.2.9, and we
haven't seen problems in ceph-volume that fail to update the
permissions on OSD devices. No one should need a UDEV rule to set the
permissions for
devices, this is a ceph-volume task.

When a system starts and the OSD activation happens, it always ensures
that the permissions are set correctly. Could you find the section of
the logs in /var/log/ceph/ceph-volume.log that shows the activation
process for ssd0/db60 ?

Hopefully you still have those around, it would help us determine why
the permissions aren't being set correctly.

> ...
>
> 2. I then ran ceph-volume activate --all again.  Saw the same error for osd.67 I described many emails ago..  None of the permissions changed.  I tried restarting ceph-osd@60, but got the same error as before:
>
> 2018-11-05 15:34:52.001782 7f5a15744e00  0 set uid:gid to 64045:64045 (ceph:ceph)
> 2018-11-05 15:34:52.001808 7f5a15744e00  0 ceph version 12.2.9 (9e300932ef8a8916fb3fda78c58691a6ab0f4217) luminous (stable), process ceph-osd, pid 36506
> 2018-11-05 15:34:52.021717 7f5a15744e00  0 pidfile_write: ignore empty --pid-file
> 2018-11-05 15:34:52.033478 7f5a15744e00  0 load: jerasure load: lrc load: isa
> 2018-11-05 15:34:52.033557 7f5a15744e00  1 bdev create path /var/lib/ceph/osd/ceph-60/block type kernel
> 2018-11-05 15:34:52.033572 7f5a15744e00  1 bdev(0x5651bd1b8d80 /var/lib/ceph/osd/ceph-60/block) open path /var/lib/ceph/osd/ceph-60/block
> 2018-11-05 15:34:52.033888 7f5a15744e00  1 bdev(0x5651bd1b8d80 /var/lib/ceph/osd/ceph-60/block) open size 10737418240 (0x280000000, 10GiB) block_size 4096 (4KiB) rotational
> 2018-11-05 15:34:52.033958 7f5a15744e00  1 bluestore(/var/lib/ceph/osd/ceph-60) _set_cache_sizes cache_size 1073741824 meta 0.4 kv 0.4 data 0.2
> 2018-11-05 15:34:52.033984 7f5a15744e00  1 bdev(0x5651bd1b8d80 /var/lib/ceph/osd/ceph-60/block) close
> 2018-11-05 15:34:52.318993 7f5a15744e00  1 bluestore(/var/lib/ceph/osd/ceph-60) _mount path /var/lib/ceph/osd/ceph-60
> 2018-11-05 15:34:52.319064 7f5a15744e00  1 bdev create path /var/lib/ceph/osd/ceph-60/block type kernel
> 2018-11-05 15:34:52.319073 7f5a15744e00  1 bdev(0x5651bd1b8fc0 /var/lib/ceph/osd/ceph-60/block) open path /var/lib/ceph/osd/ceph-60/block
> 2018-11-05 15:34:52.319356 7f5a15744e00  1 bdev(0x5651bd1b8fc0 /var/lib/ceph/osd/ceph-60/block) open size 10737418240 (0x280000000, 10GiB) block_size 4096 (4KiB) rotational
> 2018-11-05 15:34:52.319415 7f5a15744e00  1 bluestore(/var/lib/ceph/osd/ceph-60) _set_cache_sizes cache_size 1073741824 meta 0.4 kv 0.4 data 0.2
> 2018-11-05 15:34:52.319491 7f5a15744e00  1 bdev create path /var/lib/ceph/osd/ceph-60/block.db type kernel
> 2018-11-05 15:34:52.319499 7f5a15744e00  1 bdev(0x5651bd1b9200 /var/lib/ceph/osd/ceph-60/block.db) open path /var/lib/ceph/osd/ceph-60/block.db
> 2018-11-05 15:34:52.319514 7f5a15744e00 -1 bdev(0x5651bd1b9200 /var/lib/ceph/osd/ceph-60/block.db) open open got: (13) Permission denied
> 2018-11-05 15:34:52.319648 7f5a15744e00 -1 bluestore(/var/lib/ceph/osd/ceph-60) _open_db add block device(/var/lib/ceph/osd/ceph-60/block.db) returned: (13) Permission denied
> 2018-11-05 15:34:52.319666 7f5a15744e00  1 bdev(0x5651bd1b8fc0 /var/lib/ceph/osd/ceph-60/block) close
> 2018-11-05 15:34:52.598249 7f5a15744e00 -1 osd.60 0 OSD:init: unable to mount object store
> 2018-11-05 15:34:52.598269 7f5a15744e00 -1  ** ERROR: osd init failed: (13) Permission denied
>
> 3. Finally, I literally copied and pasted the udev rule Hector wrote out for me, then rebooted the server.
>
> 4. I tried restarting ceph-osd@60 -- this time it came right up!!!  I was able to start all the rest, including ceph-osd@67 which I thought did not get activated by lvm.
>
> 5. I checked from the admin node and verified osd.60-69 are all in the cluster as Bluestore OSDs and they indeed are.
>
> ********************
> Thank you SO MUCH, both of you, for putting up with my novice questions all the way.  I am planning to convert the rest of the cluster the same way by reviewing this entire thread to trace what steps need to be taken.
>
> Mami
>
> On Mon, Nov 5, 2018 at 3:00 PM, Hector Martin <hector@xxxxxxxxxxxxxx> wrote:
>>
>>
>>
>> On 11/6/18 3:31 AM, Hayashida, Mami wrote:
>> > 2018-11-05 12:47:01.075573 7f1f2775ae00 -1 bluestore(/var/lib/ceph/osd/ceph-60) _open_db add block device(/var/lib/ceph/osd/ceph-60/block.db) returned: (13) Permission denied
>>
>> Looks like the permissions on the block.db device are wrong. As far as I
>> know ceph-volume is responsible for setting this at activation time.
>>
>> > I already ran the "ceph-volume lvm activate --all "  command right after
>> > I prepared (using "lvm prepare") those OSDs.  Do I need to run the
>> > "activate" command again?
>>
>> The activation is required on every boot to create the
>> /var/lib/ceph/osd/* directory, but that should be automatically done by
>> systemd units (since you didn't run it after the reboot and yet the
>> directories exist, it seems to have worked).
>>
>> Can you ls -l the OSD directory (/var/lib/ceph/osd/ceph-60/) and also
>> any devices symlinked to from there, to see the permissions?
>>
>> Then run the activate command again and list the permissions again to
>> see if they have changed, and if they have, try to start the OSD again.
>>
>> I found one Ubuntu bug that suggests there may be a race condition:
>>
>> https://bugs.launchpad.net/bugs/1767087
>>
>> I get the feeling the ceph-osd activation may be happening before the
>> block.db device is ready, so when it gets created by LVM it's already
>> too late and doesn't have the right permissions. You could fix it with a
>> udev rule (like Ubuntu did) but if this is indeed your issue then it
>> sounds like something that should be fixed in Ceph. Perhaps all you need
>> is a systemd unit override to make sure ceph-volume@* services only
>> start after LVM is ready.
>>
>> A usable udev rule could look like this (e.g. put it in
>> /etc/udev/rules.d/90-lvm-permisions.rules):
>>
>> ACTION=="change", SUBSYSTEM=="block", ENV{DEVTYPE}=="disk", \
>> ENV{DM_LV_NAME}=="db*", ENV{DM_VG_NAME}=="ssd0", \
>> OWNER="ceph", GROUP="ceph", MODE="660"
>>
>> Reboot after that and see if the OSDs come up without further action.
>>
>> --
>> Hector Martin (hector@xxxxxxxxxxxxxx)
>> Public Key: https://mrcn.st/pub
>
>
>
>
> --
> Mami Hayashida
> Research Computing Associate
>
> Research Computing Infrastructure
> University of Kentucky Information Technology Services
> 301 Rose Street | 102 James F. Hardymon Building
> Lexington, KY 40506-0495
> mami.hayashida@xxxxxxx
> (859)323-7521
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux