Re: is it possible to remove the db+wal from an external device (nvme)

Eugen Block <eblock@xxxxxx> · Wed, 29 Sep 2021 13:48:35 +0000

That's what I did and pasted the results in my previous comments.

Zitat von 胡 玮文 <huww98@xxxxxxxxxxx>:

Yes. And “cephadm shell” command does not depend on the running  
daemon, it will start a new container. So I think it is perfectly  
fine to stop the OSD first then run the “cephadm shell” command, and  
run ceph-volume in the new shell.

发件人: Eugen Block<mailto:eblock@xxxxxx>
发送时间: 2021年9月29日 21:40
收件人: 胡 玮文<mailto:huww98@xxxxxxxxxxx>
抄送: Igor Fedotov<mailto:ifedotov@xxxxxxx>; Szabo, Istvan  
(Agoda)<mailto:Istvan.Szabo@xxxxxxxxx>;  
ceph-users@xxxxxxx<mailto:ceph-users@xxxxxxx>
主题: Re: is it possible to remove the db+wal from an external device (nvme)

The OSD has to be stopped in order to migrate DB/WAL, it can't be done
live. ceph-volume requires a lock on the device.

Zitat von 胡 玮文 <huww98@xxxxxxxxxxx>:

I’ve not tried it, but how about:

cephadm shell -n osd.0

then run “ceph-volume” commands in the newly opened shell. The
directory structure seems fine.

$ sudo cephadm shell -n osd.0
Inferring fsid e88d509a-f6fc-11ea-b25d-a0423f3ac864
Inferring config
/var/lib/ceph/e88d509a-f6fc-11ea-b25d-a0423f3ac864/osd.0/config
Using recent ceph image
cr.example.com/infra/ceph@sha256:8a0f6f285edcd6488e2c91d3f9fa43534d37d7a9b37db1e0ff6691aae6466530
root@host0:/# ll /var/lib/ceph/osd/ceph-0/
total 68
drwx------ 2 ceph ceph 4096 Sep 20 04:15 ./
drwxr-x--- 1 ceph ceph 4096 Sep 29 13:32 ../
lrwxrwxrwx 1 ceph ceph   24 Sep 20 04:15 block -> /dev/ceph-hdd/osd.0.data
lrwxrwxrwx 1 ceph ceph   23 Sep 20 04:15 block.db -> /dev/ubuntu-vg/osd.0.db
-rw------- 1 ceph ceph   37 Sep 20 04:15 ceph_fsid
-rw------- 1 ceph ceph  387 Jun 21 13:24 config
-rw------- 1 ceph ceph   37 Sep 20 04:15 fsid
-rw------- 1 ceph ceph   55 Sep 20 04:15 keyring
-rw------- 1 ceph ceph    6 Sep 20 04:15 ready
-rw------- 1 ceph ceph    3 Apr  2 01:46 require_osd_release
-rw------- 1 ceph ceph   10 Sep 20 04:15 type
-rw------- 1 ceph ceph   38 Sep 17 14:26 unit.configured
-rw------- 1 ceph ceph   48 Nov  9  2020 unit.created
-rw------- 1 ceph ceph   35 Sep 17 14:26 unit.image
-rw------- 1 ceph ceph  306 Sep 17 14:26 unit.meta
-rw------- 1 ceph ceph 1317 Sep 17 14:26 unit.poststop
-rw------- 1 ceph ceph 3021 Sep 17 14:26 unit.run
-rw------- 1 ceph ceph  142 Sep 17 14:26 unit.stop
-rw------- 1 ceph ceph    2 Sep 20 04:15 whoami

发件人: Eugen Block<mailto:eblock@xxxxxx>
发送时间: 2021年9月29日 21:29
收件人: Igor Fedotov<mailto:ifedotov@xxxxxxx>
抄送: 胡 玮文<mailto:huww98@xxxxxxxxxxx>; Szabo, Istvan
(Agoda)<mailto:Istvan.Szabo@xxxxxxxxx>;
ceph-users@xxxxxxx<mailto:ceph-users@xxxxxxx>
主题: Re:  Re: 回复: [ceph-users] Re: is it possible to
remove the db+wal from an external device (nvme)

Hi Igor,

thanks for your input. I haven't done this in a prod env yet either,
still playing around in a virtual lab env.
I tried the symlink suggestion but it's not that easy, because it
looks different underneath the ceph directory than ceph-volume expects
it. These are the services underneath:

ses7-host1:~ # ll /var/lib/ceph/152fd738-01bc-11ec-a7fd-fa163e672db2/
insgesamt 48
drwx------ 3 root       root   4096 16. Sep 16:11 alertmanager.ses7-host1
drwx------ 3 ceph       ceph   4096 29. Sep 09:03 crash
drwx------ 2 ceph       ceph   4096 16. Sep 16:39 crash.ses7-host1
drwx------ 4 messagebus lp     4096 16. Sep 16:23 grafana.ses7-host1
drw-rw---- 2 root       root   4096 24. Aug 10:00 home
drwx------ 2 ceph       ceph   4096 16. Sep 16:37 mgr.ses7-host1.wmgyit
drwx------ 3 ceph       ceph   4096 16. Sep 16:37 mon.ses7-host1
drwx------ 2 nobody     nobody 4096 16. Sep 16:37 node-exporter.ses7-host1
drwx------ 2 ceph       ceph   4096 29. Sep 08:43 osd.0
drwx------ 2 ceph       ceph   4096 29. Sep 15:11 osd.1
drwx------ 4 root       root   4096 16. Sep 16:12 prometheus.ses7-host1

While the directory in a non-containerized deployment looks like this:

nautilus:~ # ll /var/lib/ceph/osd/ceph-0/
insgesamt 24
lrwxrwxrwx 1 ceph ceph 93 29. Sep 12:21 block ->
/dev/ceph-a6d78a29-637f-494b-a839-76251fcff67e/osd-block-39340a48-54b3-4689-9896-f54d005c535d
-rw------- 1 ceph ceph 37 29. Sep 12:21 ceph_fsid
-rw------- 1 ceph ceph 37 29. Sep 12:21 fsid
-rw------- 1 ceph ceph 55 29. Sep 12:21 keyring
-rw------- 1 ceph ceph  6 29. Sep 12:21 ready
-rw------- 1 ceph ceph 10 29. Sep 12:21 type
-rw------- 1 ceph ceph  2 29. Sep 12:21 whoami

But even if I create the symlink to the osd directory it fails because
I only have ceph-volume within the containers where the symlink is not
visible to cephadm.

ses7-host1:~ # ll /var/lib/ceph/osd/ceph-1
lrwxrwxrwx 1 root root 57 29. Sep 15:08 /var/lib/ceph/osd/ceph-1 ->
/var/lib/ceph/152fd738-01bc-11ec-a7fd-fa163e672db2/osd.1/

ses7-host1:~ # cephadm ceph-volume lvm migrate --osd-id 1 --osd-fsid
b4c772aa-07f8-483d-ae58-0ab97b8d0cc4 --from db --target
ceph-b1ddff4b-95e8-4b91-b451-a3ea35d16ec0/osd-block-b4c772aa-07f8-483d-ae58-0ab97b8d0cc4
Inferring fsid 152fd738-01bc-11ec-a7fd-fa163e672db2
[...]
/usr/bin/podman: stderr --> Migrate to existing, Source:
['--devs-source', '/var/lib/ceph/osd/ceph-1/block.db'] Target:
/var/lib/ceph/osd/ceph-1/block
/usr/bin/podman: stderr  stdout: inferring bluefs devices from  
bluestore path
/usr/bin/podman: stderr  stderr: can't migrate
/var/lib/ceph/osd/ceph-1/block.db, not a valid bluefs volume
/usr/bin/podman: stderr --> Failed to migrate device, error code:1
/usr/bin/podman: stderr --> Undoing lv tag set
/usr/bin/podman: stderr Failed to migrate to :
ceph-b1ddff4b-95e8-4b91-b451-a3ea35d16ec0/osd-block-b4c772aa-07f8-483d-ae58-0ab97b8d0cc4
Traceback (most recent call last):
   File "/usr/sbin/cephadm", line 6225, in <module>
     r = args.func()
   File "/usr/sbin/cephadm", line 1363, in _infer_fsid
     return func()
   File "/usr/sbin/cephadm", line 1422, in _infer_image
     return func()
   File "/usr/sbin/cephadm", line 3687, in command_ceph_volume
     out, err, code = call_throws(c.run_cmd(),
verbosity=CallVerbosity.VERBOSE)
   File "/usr/sbin/cephadm", line 1101, in call_throws
     raise RuntimeError('Failed command: %s' % ' '.join(command))
[...]

I could install the package ceph-osd (where ceph-volume is packaged
in) but it's not available by default (as you see this is a SES 7
environment).

I'm not sure what the design is here, it feels like the ceph-volume
migrate command is not applicable to containers yet.

Regards,
Eugen

Zitat von Igor Fedotov <ifedotov@xxxxxxx>:

Hi Eugen,

indeed this looks like an issue related to containerized deployment,
"ceph-volume lvm migrate" expects osd folder to be under
/var/lib/ceph/osd:

stderr: 2021-09-29T06:56:24.787+0000 7fde05b96180 -1
bluestore(/var/lib/ceph/osd/ceph-1) _lock_fsid failed to lock
/var/lib/ceph/osd/ceph-1/fsid (is another ceph-osd still
running?)(11) Resource temporarily unavailable

As a workaround you might want to try to create a symlink to your
actual location before issuing the migrate command:
/var/lib/ceph/osd ->
/var/lib/ceph/152fd738-01bc-11ec-a7fd-fa163e672db2/

More complicated (and more general IMO) way would be to run the
migrate command from within a container deployed similarly (i.e.
with all the proper subfolder mappings) to ceph-osd one. Just
speculating - not a big expert in containers and never tried that
with properly deployed production cluster...

Thanks,

Igor

On 9/29/2021 10:07 AM, Eugen Block wrote:
Hi,

I just tried with 'ceph-volume lvm migrate' in Octopus but it
doesn't really work. I'm not sure if I'm missing something here,
but I believe it's again the already discussed containers issue. To
be able to run the command for an OSD the OSD has to be offline,
but then you don't have access to the block.db because the path is
different from outside the container:

---snip---
[ceph: root@host1 /]# ceph-volume lvm migrate --osd-id 1 --osd-fsid
b4c772aa-07f8-483d-ae58-0ab97b8d0cc4 --from db --target
ceph-b1ddff4b-95e8-4b91-b451-a3ea35d16ec0/osd-block-b4c772aa-07f8-483d-ae58-0ab97b8d0cc4 --> Migrate to existing, Source: ['--devs-source',  
'/var/lib/ceph/osd/ceph-1/block.db']
Target:
/var/lib/ceph/osd/ceph-1/block
 stdout: inferring bluefs devices from bluestore path
 stderr:
/home/abuild/rpmbuild/BUILD/ceph-15.2.14-84-gb6e5642e260/src/os/bluestore/BlueStore.cc: In function 'int BlueStore::_mount_for_bluefs()' thread  
7fde05b96180
time
2021-09-29T06:56:24.790161+0000
 stderr:
/home/abuild/rpmbuild/BUILD/ceph-15.2.14-84-gb6e5642e260/src/os/bluestore/BlueStore.cc: 6876: FAILED  
ceph_assert(r
==
0)
 stderr: 2021-09-29T06:56:24.787+0000 7fde05b96180 -1
bluestore(/var/lib/ceph/osd/ceph-1) _lock_fsid failed to lock
/var/lib/ceph/osd/ceph-1/fsid (is another ceph-osd still
running?)(11) Resource temporarily unavailable

# path outside
host1:~ # ll /var/lib/ceph/152fd738-01bc-11ec-a7fd-fa163e672db2/osd.1/
insgesamt 60
lrwxrwxrwx 1 ceph ceph   93 29. Sep 08:43 block ->
/dev/ceph-b1ddff4b-95e8-4b91-b451-a3ea35d16ec0/osd-block-b4c772aa-07f8-483d-ae58-0ab97b8d0cc4
lrwxrwxrwx 1 ceph ceph   90 29. Sep 08:43 block.db ->
/dev/ceph-6f1b8f49-daf2-4631-a2ef-12e9452b01ea/osd-db-69b11aa0-af96-443e-8f03-5afa5272131f
---snip---

But if I shutdown the OSD I can't access the block and block.db
devices. I'm not even sure how this is supposed to work with
cephadm. Maybe I'm misunderstanding, though. Or is there a way to
provide the offline block.db path to 'ceph-volume lvm migrate'?

Zitat von 胡 玮文 <huww98@xxxxxxxxxxx>:

You may need to use `ceph-volume lvm migrate’ [1] instead of
ceph-bluestore-tool. If I recall correctly, this is a pretty new
feature, I’m not sure whether it is available to your version.

If you use ceph-bluestore-tool, then you need to modify the LVM
tags manually. Please refer to the previous threads, e.g. [2] and
some more.

[1]: https://docs.ceph.com/en/latest/man/8/ceph-volume/#migrate
[2]:
https://lists.ceph.io/hyperkitty/list/ceph-users@xxxxxxx/thread/VX23NQ66P3PPEX36T3PYYMHPLBSFLMYA/#JLNDFGXR4ZLY27DHD3RJTTZEDHRZJO4Q

发件人: Szabo, Istvan (Agoda)<mailto:Istvan.Szabo@xxxxxxxxx>
发送时间: 2021年9月28日 18:20
收件人: Eugen Block<mailto:eblock@xxxxxx>;
ceph-users@xxxxxxx<mailto:ceph-users@xxxxxxx>
主题:  Re: is it possible to remove the db+wal from an
external device (nvme)

Gave a try of it, so all the 3 osds finally failed :/ Not sure
what went wrong.

Do the normal maintenance things, ceph osd set noout, ceph osd set
norebalance, stop the osd and run this command:
ceph-bluestore-tool bluefs-bdev-migrate --dev-target
/var/lib/ceph/osd/ceph-0/block --devs-source
/var/lib/ceph/osd/ceph-8/block.db --path /var/lib/ceph/osd/ceph-8/
Output:
device removed:1 /var/lib/ceph/osd/ceph-8/block.db
device added: 1 /dev/dm-2

When tried to start I got this in the log:
osd.8 0 OSD:init: unable to mount object store
 ** ERROR: osd init failed: (13) Permission denied
set uid:gid to 167:167 (ceph:ceph)
ceph version 15.2.13 (c44bc49e7a57a87d84dfff2a077a2058aa2172e2)
octopus (stable), process ceph-osd, pid 1512261
pidfile_write: ignore empty --pid-file

From the another 2 osds the block.db removed and I can start it back.
I've zapped the db drive just to be removed from the device
completely and after machine restart none of these 2 osds came
back, I guess missing the db device.

Is there any steps missing?
1.Noout+norebalance
2. Stop osd
3. migrate with the above command the block.db to the block.
4. do on the other osds which is sharing the same db device that
want to remove.
5. zap the db device
6. start back the osds.

Istvan Szabo
Senior Infrastructure Engineer
---------------------------------------------------
Agoda Services Co., Ltd.
e: istvan.szabo@xxxxxxxxx
---------------------------------------------------

-----Original Message-----
From: Eugen Block <eblock@xxxxxx>
Sent: Monday, September 27, 2021 7:42 PM
To: ceph-users@xxxxxxx
Subject:  Re: is it possible to remove the db+wal from
an external device (nvme)

Email received from the internet. If in doubt, don't click any
link nor open any attachment !
________________________________

Hi,

I think 'ceph-bluestore-tool bluefs-bdev-migrate' could be of use
here. I haven't tried it in a production environment yet, only in
virtual labs.

Regards,
Eugen

Zitat von "Szabo, Istvan (Agoda)" <Istvan.Szabo@xxxxxxxxx>:

Hi,

Seems like in our config the nvme device  as a wal+db in front of the
ssd slowing down the ssds osds.
I'd like to avoid to rebuild all the osd-, is there a way somehow
migrate to the "slower device" the wal+db without reinstall?

Ty
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an
email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send
an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx