Re: is it possible to remove the db+wal from an external device (nvme)

"Szabo, Istvan (Agoda)" <Istvan.Szabo@xxxxxxxxx> · Thu, 30 Sep 2021 13:53:09 +0000

Wow, it works like a charm 😊 Thank you very much, I've tried in my lab, however I need to update the cluster to 15.2.14, because in this version is available the migrate. In error state not sure I can update though.

Very smooth:
num=14;ceph-volume lvm migrate --osd-id $num --osd-fsid `cat /var/lib/ceph/osd/ceph-$num/fsid` --from db --target ceph-72a124f4-1fc5-49ee-85c8-24fb2618e9e5/osd-block-d8d42a38-2130-4a2b-9479-c93800ad4029
--> Migrate to existing, Source: ['--devs-source', '/var/lib/ceph/osd/ceph-14/block.db'] Target: /var/lib/ceph/osd/ceph-14/block
--> Migration successful.

Thank you Eugen, Zitat, Igor.

Istvan Szabo
Senior Infrastructure Engineer
---------------------------------------------------
Agoda Services Co., Ltd.
e: istvan.szabo@xxxxxxxxx
---------------------------------------------------

-----Original Message-----
From: Eugen Block <eblock@xxxxxx> 
Sent: Thursday, September 30, 2021 1:10 PM
To: Szabo, Istvan (Agoda) <Istvan.Szabo@xxxxxxxxx>
Cc: 胡 玮文 <huww98@xxxxxxxxxxx>; Igor Fedotov <ifedotov@xxxxxxx>; ceph-users@xxxxxxx
Subject: Re: is it possible to remove the db+wal from an external device (nvme)

Email received from the internet. If in doubt, don't click any link nor open any attachment !
________________________________

Yes, I believe for you it should work without containers although I haven't tried the migrate command in a non-containerized cluster yet.
But I believe this is a general issue for containerized clusters with regards to maintenance. I haven't checked yet if there are existing tracker issues for this, but maybe this should be worth creating one?

Zitat von "Szabo, Istvan (Agoda)" <Istvan.Szabo@xxxxxxxxx>:

> Actually I don't have containerized deployment, my is normal one. So 
> it should work the lvm migrate.
>
> Istvan Szabo
> Senior Infrastructure Engineer
> ---------------------------------------------------
> Agoda Services Co., Ltd.
> e: istvan.szabo@xxxxxxxxx
> ---------------------------------------------------
>
> -----Original Message-----
> From: Eugen Block <eblock@xxxxxx>
> Sent: Wednesday, September 29, 2021 8:49 PM
> To: 胡 玮文 <huww98@xxxxxxxxxxx>
> Cc: Igor Fedotov <ifedotov@xxxxxxx>; Szabo, Istvan (Agoda) 
> <Istvan.Szabo@xxxxxxxxx>; ceph-users@xxxxxxx
> Subject: Re: is it possible to remove the db+wal from an external 
> device (nvme)
>
> Email received from the internet. If in doubt, don't click any link 
> nor open any attachment !
> ________________________________
>
> That's what I did and pasted the results in my previous comments.
>
>
> Zitat von 胡 玮文 <huww98@xxxxxxxxxxx>:
>
>> Yes. And “cephadm shell” command does not depend on the running 
>> daemon, it will start a new container. So I think it is perfectly 
>> fine to stop the OSD first then run the “cephadm shell” command, and 
>> run ceph-volume in the new shell.
>>
>> 发件人: Eugen Block<mailto:eblock@xxxxxx>
>> 发送时间: 2021年9月29日 21:40
>> 收件人: 胡 玮文<mailto:huww98@xxxxxxxxxxx>
>> 抄送: Igor Fedotov<mailto:ifedotov@xxxxxxx>; Szabo, Istvan 
>> (Agoda)<mailto:Istvan.Szabo@xxxxxxxxx>;
>> ceph-users@xxxxxxx<mailto:ceph-users@xxxxxxx>
>> 主题: Re: is it possible to remove the db+wal from an external device
>> (nvme)
>>
>> The OSD has to be stopped in order to migrate DB/WAL, it can't be 
>> done live. ceph-volume requires a lock on the device.
>>
>>
>> Zitat von 胡 玮文 <huww98@xxxxxxxxxxx>:
>>
>>> I’ve not tried it, but how about:
>>>
>>> cephadm shell -n osd.0
>>>
>>> then run “ceph-volume” commands in the newly opened shell. The 
>>> directory structure seems fine.
>>>
>>> $ sudo cephadm shell -n osd.0
>>> Inferring fsid e88d509a-f6fc-11ea-b25d-a0423f3ac864
>>> Inferring config
>>> /var/lib/ceph/e88d509a-f6fc-11ea-b25d-a0423f3ac864/osd.0/config
>>> Using recent ceph image
>>> cr.example.com/infra/ceph@sha256:8a0f6f285edcd6488e2c91d3f9fa43534d3
>>> 7
>>> d7a9b37db1e0ff6691aae6466530 root@host0:/# ll 
>>> /var/lib/ceph/osd/ceph-0/ total 68
>>> drwx------ 2 ceph ceph 4096 Sep 20 04:15 ./
>>> drwxr-x--- 1 ceph ceph 4096 Sep 29 13:32 ../
>>> lrwxrwxrwx 1 ceph ceph   24 Sep 20 04:15 block -> /dev/ceph-hdd/osd.0.data
>>> lrwxrwxrwx 1 ceph ceph   23 Sep 20 04:15 block.db ->
>>> /dev/ubuntu-vg/osd.0.db
>>> -rw------- 1 ceph ceph   37 Sep 20 04:15 ceph_fsid
>>> -rw------- 1 ceph ceph  387 Jun 21 13:24 config
>>> -rw------- 1 ceph ceph   37 Sep 20 04:15 fsid
>>> -rw------- 1 ceph ceph   55 Sep 20 04:15 keyring
>>> -rw------- 1 ceph ceph    6 Sep 20 04:15 ready
>>> -rw------- 1 ceph ceph    3 Apr  2 01:46 require_osd_release
>>> -rw------- 1 ceph ceph   10 Sep 20 04:15 type
>>> -rw------- 1 ceph ceph   38 Sep 17 14:26 unit.configured
>>> -rw------- 1 ceph ceph   48 Nov  9  2020 unit.created
>>> -rw------- 1 ceph ceph   35 Sep 17 14:26 unit.image
>>> -rw------- 1 ceph ceph  306 Sep 17 14:26 unit.meta
>>> -rw------- 1 ceph ceph 1317 Sep 17 14:26 unit.poststop
>>> -rw------- 1 ceph ceph 3021 Sep 17 14:26 unit.run
>>> -rw------- 1 ceph ceph  142 Sep 17 14:26 unit.stop
>>> -rw------- 1 ceph ceph    2 Sep 20 04:15 whoami
>>>
>>> 发件人: Eugen Block<mailto:eblock@xxxxxx>
>>> 发送时间: 2021年9月29日 21:29
>>> 收件人: Igor Fedotov<mailto:ifedotov@xxxxxxx>
>>> 抄送: 胡 玮文<mailto:huww98@xxxxxxxxxxx>; Szabo, Istvan 
>>> (Agoda)<mailto:Istvan.Szabo@xxxxxxxxx>;
>>> ceph-users@xxxxxxx<mailto:ceph-users@xxxxxxx>
>>> 主题: Re:  Re: 回复: [ceph-users] Re: is it possible to 
>>> remove the db+wal from an external device (nvme)
>>>
>>> Hi Igor,
>>>
>>> thanks for your input. I haven't done this in a prod env yet either, 
>>> still playing around in a virtual lab env.
>>> I tried the symlink suggestion but it's not that easy, because it 
>>> looks different underneath the ceph directory than ceph-volume 
>>> expects it. These are the services underneath:
>>>
>>> ses7-host1:~ # ll 
>>> /var/lib/ceph/152fd738-01bc-11ec-a7fd-fa163e672db2/
>>> insgesamt 48
>>> drwx------ 3 root       root   4096 16. Sep 16:11 alertmanager.ses7-host1
>>> drwx------ 3 ceph       ceph   4096 29. Sep 09:03 crash
>>> drwx------ 2 ceph       ceph   4096 16. Sep 16:39 crash.ses7-host1
>>> drwx------ 4 messagebus lp     4096 16. Sep 16:23 grafana.ses7-host1
>>> drw-rw---- 2 root       root   4096 24. Aug 10:00 home
>>> drwx------ 2 ceph       ceph   4096 16. Sep 16:37 mgr.ses7-host1.wmgyit
>>> drwx------ 3 ceph       ceph   4096 16. Sep 16:37 mon.ses7-host1
>>> drwx------ 2 nobody     nobody 4096 16. Sep 16:37 node-exporter.ses7-host1
>>> drwx------ 2 ceph       ceph   4096 29. Sep 08:43 osd.0
>>> drwx------ 2 ceph       ceph   4096 29. Sep 15:11 osd.1
>>> drwx------ 4 root       root   4096 16. Sep 16:12 prometheus.ses7-host1
>>>
>>>
>>> While the directory in a non-containerized deployment looks like this:
>>>
>>> nautilus:~ # ll /var/lib/ceph/osd/ceph-0/ insgesamt 24 lrwxrwxrwx 1 
>>> ceph ceph 93 29. Sep 12:21 block -> 
>>> /dev/ceph-a6d78a29-637f-494b-a839-76251fcff67e/osd-block-39340a48-54
>>> b
>>> 3-4689-9896-f54d005c535d
>>> -rw------- 1 ceph ceph 37 29. Sep 12:21 ceph_fsid
>>> -rw------- 1 ceph ceph 37 29. Sep 12:21 fsid
>>> -rw------- 1 ceph ceph 55 29. Sep 12:21 keyring
>>> -rw------- 1 ceph ceph  6 29. Sep 12:21 ready
>>> -rw------- 1 ceph ceph 10 29. Sep 12:21 type
>>> -rw------- 1 ceph ceph  2 29. Sep 12:21 whoami
>>>
>>>
>>> But even if I create the symlink to the osd directory it fails 
>>> because I only have ceph-volume within the containers where the 
>>> symlink is not visible to cephadm.
>>>
>>>
>>> ses7-host1:~ # ll /var/lib/ceph/osd/ceph-1 lrwxrwxrwx 1 root root 57 
>>> 29. Sep 15:08 /var/lib/ceph/osd/ceph-1 -> 
>>> /var/lib/ceph/152fd738-01bc-11ec-a7fd-fa163e672db2/osd.1/
>>>
>>> ses7-host1:~ # cephadm ceph-volume lvm migrate --osd-id 1 --osd-fsid
>>> b4c772aa-07f8-483d-ae58-0ab97b8d0cc4 --from db --target
>>> ceph-b1ddff4b-95e8-4b91-b451-a3ea35d16ec0/osd-block-b4c772aa-07f8-48
>>> 3
>>> d-ae58-0ab97b8d0cc4 Inferring fsid
>>> 152fd738-01bc-11ec-a7fd-fa163e672db2
>>> [...]
>>> /usr/bin/podman: stderr --> Migrate to existing, Source:
>>> ['--devs-source', '/var/lib/ceph/osd/ceph-1/block.db'] Target:
>>> /var/lib/ceph/osd/ceph-1/block
>>> /usr/bin/podman: stderr  stdout: inferring bluefs devices from 
>>> bluestore path
>>> /usr/bin/podman: stderr  stderr: can't migrate 
>>> /var/lib/ceph/osd/ceph-1/block.db, not a valid bluefs volume
>>> /usr/bin/podman: stderr --> Failed to migrate device, error code:1
>>> /usr/bin/podman: stderr --> Undoing lv tag set
>>> /usr/bin/podman: stderr Failed to migrate to :
>>> ceph-b1ddff4b-95e8-4b91-b451-a3ea35d16ec0/osd-block-b4c772aa-07f8-48
>>> 3
>>> d-ae58-0ab97b8d0cc4
>>> Traceback (most recent call last):
>>>    File "/usr/sbin/cephadm", line 6225, in <module>
>>>      r = args.func()
>>>    File "/usr/sbin/cephadm", line 1363, in _infer_fsid
>>>      return func()
>>>    File "/usr/sbin/cephadm", line 1422, in _infer_image
>>>      return func()
>>>    File "/usr/sbin/cephadm", line 3687, in command_ceph_volume
>>>      out, err, code = call_throws(c.run_cmd(),
>>> verbosity=CallVerbosity.VERBOSE)
>>>    File "/usr/sbin/cephadm", line 1101, in call_throws
>>>      raise RuntimeError('Failed command: %s' % ' '.join(command)) 
>>> [...]
>>>
>>>
>>> I could install the package ceph-osd (where ceph-volume is packaged
>>> in) but it's not available by default (as you see this is a SES 7 
>>> environment).
>>>
>>> I'm not sure what the design is here, it feels like the ceph-volume 
>>> migrate command is not applicable to containers yet.
>>>
>>> Regards,
>>> Eugen
>>>
>>>
>>> Zitat von Igor Fedotov <ifedotov@xxxxxxx>:
>>>
>>>> Hi Eugen,
>>>>
>>>> indeed this looks like an issue related to containerized 
>>>> deployment, "ceph-volume lvm migrate" expects osd folder to be 
>>>> under
>>>> /var/lib/ceph/osd:
>>>>
>>>>> stderr: 2021-09-29T06:56:24.787+0000 7fde05b96180 -1
>>>>> bluestore(/var/lib/ceph/osd/ceph-1) _lock_fsid failed to lock 
>>>>> /var/lib/ceph/osd/ceph-1/fsid (is another ceph-osd still
>>>>> running?)(11) Resource temporarily unavailable
>>>>
>>>> As a workaround you might want to try to create a symlink to your 
>>>> actual location before issuing the migrate command:
>>>> /var/lib/ceph/osd ->
>>>> /var/lib/ceph/152fd738-01bc-11ec-a7fd-fa163e672db2/
>>>>
>>>> More complicated (and more general IMO) way would be to run the 
>>>> migrate command from within a container deployed similarly (i.e.
>>>> with all the proper subfolder mappings) to ceph-osd one. Just 
>>>> speculating - not a big expert in containers and never tried that 
>>>> with properly deployed production cluster...
>>>>
>>>>
>>>> Thanks,
>>>>
>>>> Igor
>>>>
>>>> On 9/29/2021 10:07 AM, Eugen Block wrote:
>>>>> Hi,
>>>>>
>>>>> I just tried with 'ceph-volume lvm migrate' in Octopus but it 
>>>>> doesn't really work. I'm not sure if I'm missing something here, 
>>>>> but I believe it's again the already discussed containers issue. 
>>>>> To be able to run the command for an OSD the OSD has to be 
>>>>> offline, but then you don't have access to the block.db because 
>>>>> the path is different from outside the container:
>>>>>
>>>>> ---snip---
>>>>> [ceph: root@host1 /]# ceph-volume lvm migrate --osd-id 1 
>>>>> --osd-fsid
>>>>> b4c772aa-07f8-483d-ae58-0ab97b8d0cc4 --from db --target
>>>>> ceph-b1ddff4b-95e8-4b91-b451-a3ea35d16ec0/osd-block-b4c772aa-07f8-
>>>>> 4
>>>>> 83d-ae58-0ab97b8d0cc4 --> Migrate to existing, Source:
>>>>> ['--devs-source', '/var/lib/ceph/osd/ceph-1/block.db']
>>>>> Target:
>>>>> /var/lib/ceph/osd/ceph-1/block
>>>>>  stdout: inferring bluefs devices from bluestore path
>>>>>  stderr:
>>>>> /home/abuild/rpmbuild/BUILD/ceph-15.2.14-84-gb6e5642e260/src/os/bl
>>>>> u
>>>>> estore/BlueStore.cc: In function 'int 
>>>>> BlueStore::_mount_for_bluefs()' thread
>>>>> 7fde05b96180
>>>>> time
>>>>> 2021-09-29T06:56:24.790161+0000
>>>>>  stderr:
>>>>> /home/abuild/rpmbuild/BUILD/ceph-15.2.14-84-gb6e5642e260/src/os/bl
>>>>> u
>>>>> estore/BlueStore.cc: 6876: FAILED ceph_assert(r ==
>>>>> 0)
>>>>>  stderr: 2021-09-29T06:56:24.787+0000 7fde05b96180 -1
>>>>> bluestore(/var/lib/ceph/osd/ceph-1) _lock_fsid failed to lock 
>>>>> /var/lib/ceph/osd/ceph-1/fsid (is another ceph-osd still
>>>>> running?)(11) Resource temporarily unavailable
>>>>>
>>>>>
>>>>> # path outside
>>>>> host1:~ # ll
>>>>> /var/lib/ceph/152fd738-01bc-11ec-a7fd-fa163e672db2/osd.1/
>>>>> insgesamt 60
>>>>> lrwxrwxrwx 1 ceph ceph   93 29. Sep 08:43 block ->
>>>>> /dev/ceph-b1ddff4b-95e8-4b91-b451-a3ea35d16ec0/osd-block-b4c772aa-07f8-483d-ae58-0ab97b8d0cc4
>>>>> lrwxrwxrwx 1 ceph ceph   90 29. Sep 08:43 block.db ->
>>>>> /dev/ceph-6f1b8f49-daf2-4631-a2ef-12e9452b01ea/osd-db-69b11aa0-af9
>>>>> 6
>>>>> -443e-8f03-5afa5272131f
>>>>> ---snip---
>>>>>
>>>>>
>>>>> But if I shutdown the OSD I can't access the block and block.db 
>>>>> devices. I'm not even sure how this is supposed to work with 
>>>>> cephadm. Maybe I'm misunderstanding, though. Or is there a way to 
>>>>> provide the offline block.db path to 'ceph-volume lvm migrate'?
>>>>>
>>>>>
>>>>>
>>>>> Zitat von 胡 玮文 <huww98@xxxxxxxxxxx>:
>>>>>
>>>>>> You may need to use `ceph-volume lvm migrate’ [1] instead of 
>>>>>> ceph-bluestore-tool. If I recall correctly, this is a pretty new 
>>>>>> feature, I’m not sure whether it is available to your version.
>>>>>>
>>>>>> If you use ceph-bluestore-tool, then you need to modify the LVM 
>>>>>> tags manually. Please refer to the previous threads, e.g. [2] and 
>>>>>> some more.
>>>>>>
>>>>>> [1]: https://docs.ceph.com/en/latest/man/8/ceph-volume/#migrate
>>>>>> [2]:
>>>>>> https://lists.ceph.io/hyperkitty/list/ceph-users@xxxxxxx/thread/V
>>>>>> X 
>>>>>> 23NQ66P3PPEX36T3PYYMHPLBSFLMYA/#JLNDFGXR4ZLY27DHD3RJTTZEDHRZJO4Q
>>>>>>
>>>>>> 发件人: Szabo, Istvan (Agoda)<mailto:Istvan.Szabo@xxxxxxxxx>
>>>>>> 发送时间: 2021年9月28日 18:20
>>>>>> 收件人: Eugen Block<mailto:eblock@xxxxxx>; 
>>>>>> ceph-users@xxxxxxx<mailto:ceph-users@xxxxxxx>
>>>>>> 主题:  Re: is it possible to remove the db+wal from an 
>>>>>> external device (nvme)
>>>>>>
>>>>>> Gave a try of it, so all the 3 osds finally failed :/ Not sure 
>>>>>> what went wrong.
>>>>>>
>>>>>> Do the normal maintenance things, ceph osd set noout, ceph osd 
>>>>>> set norebalance, stop the osd and run this command:
>>>>>> ceph-bluestore-tool bluefs-bdev-migrate --dev-target 
>>>>>> /var/lib/ceph/osd/ceph-0/block --devs-source 
>>>>>> /var/lib/ceph/osd/ceph-8/block.db --path 
>>>>>> /var/lib/ceph/osd/ceph-8/
>>>>>> Output:
>>>>>> device removed:1 /var/lib/ceph/osd/ceph-8/block.db device added: 
>>>>>> 1
>>>>>> /dev/dm-2
>>>>>>
>>>>>> When tried to start I got this in the log:
>>>>>> osd.8 0 OSD:init: unable to mount object store
>>>>>>  ** ERROR: osd init failed: (13) Permission denied set uid:gid to
>>>>>> 167:167 (ceph:ceph) ceph version 15.2.13
>>>>>> (c44bc49e7a57a87d84dfff2a077a2058aa2172e2)
>>>>>> octopus (stable), process ceph-osd, pid 1512261
>>>>>> pidfile_write: ignore empty --pid-file
>>>>>>
>>>>>> From the another 2 osds the block.db removed and I can start it back.
>>>>>> I've zapped the db drive just to be removed from the device 
>>>>>> completely and after machine restart none of these 2 osds came 
>>>>>> back, I guess missing the db device.
>>>>>>
>>>>>> Is there any steps missing?
>>>>>> 1.Noout+norebalance
>>>>>> 2. Stop osd
>>>>>> 3. migrate with the above command the block.db to the block.
>>>>>> 4. do on the other osds which is sharing the same db device that 
>>>>>> want to remove.
>>>>>> 5. zap the db device
>>>>>> 6. start back the osds.
>>>>>>
>>>>>> Istvan Szabo
>>>>>> Senior Infrastructure Engineer
>>>>>> ---------------------------------------------------
>>>>>> Agoda Services Co., Ltd.
>>>>>> e: istvan.szabo@xxxxxxxxx
>>>>>> ---------------------------------------------------
>>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Eugen Block <eblock@xxxxxx>
>>>>>> Sent: Monday, September 27, 2021 7:42 PM
>>>>>> To: ceph-users@xxxxxxx
>>>>>> Subject:  Re: is it possible to remove the db+wal 
>>>>>> from an external device (nvme)
>>>>>>
>>>>>> Email received from the internet. If in doubt, don't click any 
>>>>>> link nor open any attachment !
>>>>>> ________________________________
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I think 'ceph-bluestore-tool bluefs-bdev-migrate' could be of use 
>>>>>> here. I haven't tried it in a production environment yet, only in 
>>>>>> virtual labs.
>>>>>>
>>>>>> Regards,
>>>>>> Eugen
>>>>>>
>>>>>>
>>>>>> Zitat von "Szabo, Istvan (Agoda)" <Istvan.Szabo@xxxxxxxxx>:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> Seems like in our config the nvme device  as a wal+db in front 
>>>>>>> of the ssd slowing down the ssds osds.
>>>>>>> I'd like to avoid to rebuild all the osd-, is there a way 
>>>>>>> somehow migrate to the "slower device" the wal+db without reinstall?
>>>>>>>
>>>>>>> Ty
>>>>>>> _______________________________________________
>>>>>>> ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe 
>>>>>>> send an email to ceph-users-leave@xxxxxxx
>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send 
>>>>>> an email to ceph-users-leave@xxxxxxx 
>>>>>> _______________________________________________
>>>>>> ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send 
>>>>>> an email to ceph-users-leave@xxxxxxx
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send 
>>>>> an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx