Re: is it possible to remove the db+wal from an external device (nvme)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Actually I don't have containerized deployment, my is normal one. So it should work the lvm migrate. 

Istvan Szabo
Senior Infrastructure Engineer
---------------------------------------------------
Agoda Services Co., Ltd.
e: istvan.szabo@xxxxxxxxx
---------------------------------------------------

-----Original Message-----
From: Eugen Block <eblock@xxxxxx> 
Sent: Wednesday, September 29, 2021 8:49 PM
To: 胡 玮文 <huww98@xxxxxxxxxxx>
Cc: Igor Fedotov <ifedotov@xxxxxxx>; Szabo, Istvan (Agoda) <Istvan.Szabo@xxxxxxxxx>; ceph-users@xxxxxxx
Subject: Re: is it possible to remove the db+wal from an external device (nvme)

Email received from the internet. If in doubt, don't click any link nor open any attachment !
________________________________

That's what I did and pasted the results in my previous comments.


Zitat von 胡 玮文 <huww98@xxxxxxxxxxx>:

> Yes. And “cephadm shell” command does not depend on the running 
> daemon, it will start a new container. So I think it is perfectly fine 
> to stop the OSD first then run the “cephadm shell” command, and run 
> ceph-volume in the new shell.
>
> 发件人: Eugen Block<mailto:eblock@xxxxxx>
> 发送时间: 2021年9月29日 21:40
> 收件人: 胡 玮文<mailto:huww98@xxxxxxxxxxx>
> 抄送: Igor Fedotov<mailto:ifedotov@xxxxxxx>; Szabo, Istvan 
> (Agoda)<mailto:Istvan.Szabo@xxxxxxxxx>;
> ceph-users@xxxxxxx<mailto:ceph-users@xxxxxxx>
> 主题: Re: is it possible to remove the db+wal from an external device 
> (nvme)
>
> The OSD has to be stopped in order to migrate DB/WAL, it can't be done 
> live. ceph-volume requires a lock on the device.
>
>
> Zitat von 胡 玮文 <huww98@xxxxxxxxxxx>:
>
>> I’ve not tried it, but how about:
>>
>> cephadm shell -n osd.0
>>
>> then run “ceph-volume” commands in the newly opened shell. The 
>> directory structure seems fine.
>>
>> $ sudo cephadm shell -n osd.0
>> Inferring fsid e88d509a-f6fc-11ea-b25d-a0423f3ac864
>> Inferring config
>> /var/lib/ceph/e88d509a-f6fc-11ea-b25d-a0423f3ac864/osd.0/config
>> Using recent ceph image
>> cr.example.com/infra/ceph@sha256:8a0f6f285edcd6488e2c91d3f9fa43534d37
>> d7a9b37db1e0ff6691aae6466530 root@host0:/# ll 
>> /var/lib/ceph/osd/ceph-0/ total 68
>> drwx------ 2 ceph ceph 4096 Sep 20 04:15 ./
>> drwxr-x--- 1 ceph ceph 4096 Sep 29 13:32 ../
>> lrwxrwxrwx 1 ceph ceph   24 Sep 20 04:15 block -> /dev/ceph-hdd/osd.0.data
>> lrwxrwxrwx 1 ceph ceph   23 Sep 20 04:15 block.db -> /dev/ubuntu-vg/osd.0.db
>> -rw------- 1 ceph ceph   37 Sep 20 04:15 ceph_fsid
>> -rw------- 1 ceph ceph  387 Jun 21 13:24 config
>> -rw------- 1 ceph ceph   37 Sep 20 04:15 fsid
>> -rw------- 1 ceph ceph   55 Sep 20 04:15 keyring
>> -rw------- 1 ceph ceph    6 Sep 20 04:15 ready
>> -rw------- 1 ceph ceph    3 Apr  2 01:46 require_osd_release
>> -rw------- 1 ceph ceph   10 Sep 20 04:15 type
>> -rw------- 1 ceph ceph   38 Sep 17 14:26 unit.configured
>> -rw------- 1 ceph ceph   48 Nov  9  2020 unit.created
>> -rw------- 1 ceph ceph   35 Sep 17 14:26 unit.image
>> -rw------- 1 ceph ceph  306 Sep 17 14:26 unit.meta
>> -rw------- 1 ceph ceph 1317 Sep 17 14:26 unit.poststop
>> -rw------- 1 ceph ceph 3021 Sep 17 14:26 unit.run
>> -rw------- 1 ceph ceph  142 Sep 17 14:26 unit.stop
>> -rw------- 1 ceph ceph    2 Sep 20 04:15 whoami
>>
>> 发件人: Eugen Block<mailto:eblock@xxxxxx>
>> 发送时间: 2021年9月29日 21:29
>> 收件人: Igor Fedotov<mailto:ifedotov@xxxxxxx>
>> 抄送: 胡 玮文<mailto:huww98@xxxxxxxxxxx>; Szabo, Istvan 
>> (Agoda)<mailto:Istvan.Szabo@xxxxxxxxx>;
>> ceph-users@xxxxxxx<mailto:ceph-users@xxxxxxx>
>> 主题: Re:  Re: 回复: [ceph-users] Re: is it possible to 
>> remove the db+wal from an external device (nvme)
>>
>> Hi Igor,
>>
>> thanks for your input. I haven't done this in a prod env yet either, 
>> still playing around in a virtual lab env.
>> I tried the symlink suggestion but it's not that easy, because it 
>> looks different underneath the ceph directory than ceph-volume 
>> expects it. These are the services underneath:
>>
>> ses7-host1:~ # ll /var/lib/ceph/152fd738-01bc-11ec-a7fd-fa163e672db2/
>> insgesamt 48
>> drwx------ 3 root       root   4096 16. Sep 16:11 alertmanager.ses7-host1
>> drwx------ 3 ceph       ceph   4096 29. Sep 09:03 crash
>> drwx------ 2 ceph       ceph   4096 16. Sep 16:39 crash.ses7-host1
>> drwx------ 4 messagebus lp     4096 16. Sep 16:23 grafana.ses7-host1
>> drw-rw---- 2 root       root   4096 24. Aug 10:00 home
>> drwx------ 2 ceph       ceph   4096 16. Sep 16:37 mgr.ses7-host1.wmgyit
>> drwx------ 3 ceph       ceph   4096 16. Sep 16:37 mon.ses7-host1
>> drwx------ 2 nobody     nobody 4096 16. Sep 16:37 node-exporter.ses7-host1
>> drwx------ 2 ceph       ceph   4096 29. Sep 08:43 osd.0
>> drwx------ 2 ceph       ceph   4096 29. Sep 15:11 osd.1
>> drwx------ 4 root       root   4096 16. Sep 16:12 prometheus.ses7-host1
>>
>>
>> While the directory in a non-containerized deployment looks like this:
>>
>> nautilus:~ # ll /var/lib/ceph/osd/ceph-0/ insgesamt 24 lrwxrwxrwx 1 
>> ceph ceph 93 29. Sep 12:21 block -> 
>> /dev/ceph-a6d78a29-637f-494b-a839-76251fcff67e/osd-block-39340a48-54b
>> 3-4689-9896-f54d005c535d
>> -rw------- 1 ceph ceph 37 29. Sep 12:21 ceph_fsid
>> -rw------- 1 ceph ceph 37 29. Sep 12:21 fsid
>> -rw------- 1 ceph ceph 55 29. Sep 12:21 keyring
>> -rw------- 1 ceph ceph  6 29. Sep 12:21 ready
>> -rw------- 1 ceph ceph 10 29. Sep 12:21 type
>> -rw------- 1 ceph ceph  2 29. Sep 12:21 whoami
>>
>>
>> But even if I create the symlink to the osd directory it fails 
>> because I only have ceph-volume within the containers where the 
>> symlink is not visible to cephadm.
>>
>>
>> ses7-host1:~ # ll /var/lib/ceph/osd/ceph-1 lrwxrwxrwx 1 root root 57 
>> 29. Sep 15:08 /var/lib/ceph/osd/ceph-1 -> 
>> /var/lib/ceph/152fd738-01bc-11ec-a7fd-fa163e672db2/osd.1/
>>
>> ses7-host1:~ # cephadm ceph-volume lvm migrate --osd-id 1 --osd-fsid
>> b4c772aa-07f8-483d-ae58-0ab97b8d0cc4 --from db --target
>> ceph-b1ddff4b-95e8-4b91-b451-a3ea35d16ec0/osd-block-b4c772aa-07f8-483
>> d-ae58-0ab97b8d0cc4 Inferring fsid 
>> 152fd738-01bc-11ec-a7fd-fa163e672db2
>> [...]
>> /usr/bin/podman: stderr --> Migrate to existing, Source:
>> ['--devs-source', '/var/lib/ceph/osd/ceph-1/block.db'] Target:
>> /var/lib/ceph/osd/ceph-1/block
>> /usr/bin/podman: stderr  stdout: inferring bluefs devices from 
>> bluestore path
>> /usr/bin/podman: stderr  stderr: can't migrate 
>> /var/lib/ceph/osd/ceph-1/block.db, not a valid bluefs volume
>> /usr/bin/podman: stderr --> Failed to migrate device, error code:1
>> /usr/bin/podman: stderr --> Undoing lv tag set
>> /usr/bin/podman: stderr Failed to migrate to :
>> ceph-b1ddff4b-95e8-4b91-b451-a3ea35d16ec0/osd-block-b4c772aa-07f8-483
>> d-ae58-0ab97b8d0cc4
>> Traceback (most recent call last):
>>    File "/usr/sbin/cephadm", line 6225, in <module>
>>      r = args.func()
>>    File "/usr/sbin/cephadm", line 1363, in _infer_fsid
>>      return func()
>>    File "/usr/sbin/cephadm", line 1422, in _infer_image
>>      return func()
>>    File "/usr/sbin/cephadm", line 3687, in command_ceph_volume
>>      out, err, code = call_throws(c.run_cmd(),
>> verbosity=CallVerbosity.VERBOSE)
>>    File "/usr/sbin/cephadm", line 1101, in call_throws
>>      raise RuntimeError('Failed command: %s' % ' '.join(command)) 
>> [...]
>>
>>
>> I could install the package ceph-osd (where ceph-volume is packaged
>> in) but it's not available by default (as you see this is a SES 7 
>> environment).
>>
>> I'm not sure what the design is here, it feels like the ceph-volume 
>> migrate command is not applicable to containers yet.
>>
>> Regards,
>> Eugen
>>
>>
>> Zitat von Igor Fedotov <ifedotov@xxxxxxx>:
>>
>>> Hi Eugen,
>>>
>>> indeed this looks like an issue related to containerized deployment, 
>>> "ceph-volume lvm migrate" expects osd folder to be under
>>> /var/lib/ceph/osd:
>>>
>>>> stderr: 2021-09-29T06:56:24.787+0000 7fde05b96180 -1
>>>> bluestore(/var/lib/ceph/osd/ceph-1) _lock_fsid failed to lock 
>>>> /var/lib/ceph/osd/ceph-1/fsid (is another ceph-osd still
>>>> running?)(11) Resource temporarily unavailable
>>>
>>> As a workaround you might want to try to create a symlink to your 
>>> actual location before issuing the migrate command:
>>> /var/lib/ceph/osd ->
>>> /var/lib/ceph/152fd738-01bc-11ec-a7fd-fa163e672db2/
>>>
>>> More complicated (and more general IMO) way would be to run the 
>>> migrate command from within a container deployed similarly (i.e.
>>> with all the proper subfolder mappings) to ceph-osd one. Just 
>>> speculating - not a big expert in containers and never tried that 
>>> with properly deployed production cluster...
>>>
>>>
>>> Thanks,
>>>
>>> Igor
>>>
>>> On 9/29/2021 10:07 AM, Eugen Block wrote:
>>>> Hi,
>>>>
>>>> I just tried with 'ceph-volume lvm migrate' in Octopus but it 
>>>> doesn't really work. I'm not sure if I'm missing something here, 
>>>> but I believe it's again the already discussed containers issue. To 
>>>> be able to run the command for an OSD the OSD has to be offline, 
>>>> but then you don't have access to the block.db because the path is 
>>>> different from outside the container:
>>>>
>>>> ---snip---
>>>> [ceph: root@host1 /]# ceph-volume lvm migrate --osd-id 1 --osd-fsid
>>>> b4c772aa-07f8-483d-ae58-0ab97b8d0cc4 --from db --target
>>>> ceph-b1ddff4b-95e8-4b91-b451-a3ea35d16ec0/osd-block-b4c772aa-07f8-4
>>>> 83d-ae58-0ab97b8d0cc4 --> Migrate to existing, Source: 
>>>> ['--devs-source', '/var/lib/ceph/osd/ceph-1/block.db']
>>>> Target:
>>>> /var/lib/ceph/osd/ceph-1/block
>>>>  stdout: inferring bluefs devices from bluestore path
>>>>  stderr:
>>>> /home/abuild/rpmbuild/BUILD/ceph-15.2.14-84-gb6e5642e260/src/os/blu
>>>> estore/BlueStore.cc: In function 'int 
>>>> BlueStore::_mount_for_bluefs()' thread
>>>> 7fde05b96180
>>>> time
>>>> 2021-09-29T06:56:24.790161+0000
>>>>  stderr:
>>>> /home/abuild/rpmbuild/BUILD/ceph-15.2.14-84-gb6e5642e260/src/os/blu
>>>> estore/BlueStore.cc: 6876: FAILED ceph_assert(r ==
>>>> 0)
>>>>  stderr: 2021-09-29T06:56:24.787+0000 7fde05b96180 -1
>>>> bluestore(/var/lib/ceph/osd/ceph-1) _lock_fsid failed to lock 
>>>> /var/lib/ceph/osd/ceph-1/fsid (is another ceph-osd still
>>>> running?)(11) Resource temporarily unavailable
>>>>
>>>>
>>>> # path outside
>>>> host1:~ # ll 
>>>> /var/lib/ceph/152fd738-01bc-11ec-a7fd-fa163e672db2/osd.1/
>>>> insgesamt 60
>>>> lrwxrwxrwx 1 ceph ceph   93 29. Sep 08:43 block ->
>>>> /dev/ceph-b1ddff4b-95e8-4b91-b451-a3ea35d16ec0/osd-block-b4c772aa-07f8-483d-ae58-0ab97b8d0cc4
>>>> lrwxrwxrwx 1 ceph ceph   90 29. Sep 08:43 block.db ->
>>>> /dev/ceph-6f1b8f49-daf2-4631-a2ef-12e9452b01ea/osd-db-69b11aa0-af96
>>>> -443e-8f03-5afa5272131f
>>>> ---snip---
>>>>
>>>>
>>>> But if I shutdown the OSD I can't access the block and block.db 
>>>> devices. I'm not even sure how this is supposed to work with 
>>>> cephadm. Maybe I'm misunderstanding, though. Or is there a way to 
>>>> provide the offline block.db path to 'ceph-volume lvm migrate'?
>>>>
>>>>
>>>>
>>>> Zitat von 胡 玮文 <huww98@xxxxxxxxxxx>:
>>>>
>>>>> You may need to use `ceph-volume lvm migrate’ [1] instead of 
>>>>> ceph-bluestore-tool. If I recall correctly, this is a pretty new 
>>>>> feature, I’m not sure whether it is available to your version.
>>>>>
>>>>> If you use ceph-bluestore-tool, then you need to modify the LVM 
>>>>> tags manually. Please refer to the previous threads, e.g. [2] and 
>>>>> some more.
>>>>>
>>>>> [1]: https://docs.ceph.com/en/latest/man/8/ceph-volume/#migrate
>>>>> [2]:
>>>>> https://lists.ceph.io/hyperkitty/list/ceph-users@xxxxxxx/thread/VX
>>>>> 23NQ66P3PPEX36T3PYYMHPLBSFLMYA/#JLNDFGXR4ZLY27DHD3RJTTZEDHRZJO4Q
>>>>>
>>>>> 发件人: Szabo, Istvan (Agoda)<mailto:Istvan.Szabo@xxxxxxxxx>
>>>>> 发送时间: 2021年9月28日 18:20
>>>>> 收件人: Eugen Block<mailto:eblock@xxxxxx>; 
>>>>> ceph-users@xxxxxxx<mailto:ceph-users@xxxxxxx>
>>>>> 主题:  Re: is it possible to remove the db+wal from an 
>>>>> external device (nvme)
>>>>>
>>>>> Gave a try of it, so all the 3 osds finally failed :/ Not sure 
>>>>> what went wrong.
>>>>>
>>>>> Do the normal maintenance things, ceph osd set noout, ceph osd set 
>>>>> norebalance, stop the osd and run this command:
>>>>> ceph-bluestore-tool bluefs-bdev-migrate --dev-target 
>>>>> /var/lib/ceph/osd/ceph-0/block --devs-source 
>>>>> /var/lib/ceph/osd/ceph-8/block.db --path /var/lib/ceph/osd/ceph-8/
>>>>> Output:
>>>>> device removed:1 /var/lib/ceph/osd/ceph-8/block.db device added: 1 
>>>>> /dev/dm-2
>>>>>
>>>>> When tried to start I got this in the log:
>>>>> osd.8 0 OSD:init: unable to mount object store
>>>>>  ** ERROR: osd init failed: (13) Permission denied set uid:gid to 
>>>>> 167:167 (ceph:ceph) ceph version 15.2.13 
>>>>> (c44bc49e7a57a87d84dfff2a077a2058aa2172e2)
>>>>> octopus (stable), process ceph-osd, pid 1512261
>>>>> pidfile_write: ignore empty --pid-file
>>>>>
>>>>> From the another 2 osds the block.db removed and I can start it back.
>>>>> I've zapped the db drive just to be removed from the device 
>>>>> completely and after machine restart none of these 2 osds came 
>>>>> back, I guess missing the db device.
>>>>>
>>>>> Is there any steps missing?
>>>>> 1.Noout+norebalance
>>>>> 2. Stop osd
>>>>> 3. migrate with the above command the block.db to the block.
>>>>> 4. do on the other osds which is sharing the same db device that 
>>>>> want to remove.
>>>>> 5. zap the db device
>>>>> 6. start back the osds.
>>>>>
>>>>> Istvan Szabo
>>>>> Senior Infrastructure Engineer
>>>>> ---------------------------------------------------
>>>>> Agoda Services Co., Ltd.
>>>>> e: istvan.szabo@xxxxxxxxx
>>>>> ---------------------------------------------------
>>>>>
>>>>> -----Original Message-----
>>>>> From: Eugen Block <eblock@xxxxxx>
>>>>> Sent: Monday, September 27, 2021 7:42 PM
>>>>> To: ceph-users@xxxxxxx
>>>>> Subject:  Re: is it possible to remove the db+wal from 
>>>>> an external device (nvme)
>>>>>
>>>>> Email received from the internet. If in doubt, don't click any 
>>>>> link nor open any attachment !
>>>>> ________________________________
>>>>>
>>>>> Hi,
>>>>>
>>>>> I think 'ceph-bluestore-tool bluefs-bdev-migrate' could be of use 
>>>>> here. I haven't tried it in a production environment yet, only in 
>>>>> virtual labs.
>>>>>
>>>>> Regards,
>>>>> Eugen
>>>>>
>>>>>
>>>>> Zitat von "Szabo, Istvan (Agoda)" <Istvan.Szabo@xxxxxxxxx>:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> Seems like in our config the nvme device  as a wal+db in front of 
>>>>>> the ssd slowing down the ssds osds.
>>>>>> I'd like to avoid to rebuild all the osd-, is there a way somehow 
>>>>>> migrate to the "slower device" the wal+db without reinstall?
>>>>>>
>>>>>> Ty
>>>>>> _______________________________________________
>>>>>> ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send 
>>>>>> an email to ceph-users-leave@xxxxxxx
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send 
>>>>> an email to ceph-users-leave@xxxxxxx 
>>>>> _______________________________________________
>>>>> ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send 
>>>>> an email to ceph-users-leave@xxxxxxx
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send 
>>>> an email to ceph-users-leave@xxxxxxx



_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux