Hi, I'm curious - how did you tell that the separate WAL+DB volume was slowing things down? I assume you did some benchmarking - is there any chance you'd be willing to share results? (Or anybody else that's been in a similar situation). What sorts of devices are you using for the WAL+DB, versus the data disks? We're using NAND SSDs, with Optanes for the WAL+DB, and on some systems I am seeing slowly than expected behaviour - need to dive deeper into it In my case, I was running with 4 or 2 OSDs per Optane volume: https://www.reddit.com/r/ceph/comments/k2lef1/how_many_waldb_partitions_can_you_run_per_optane/ but I couldn't seem to get the results I'd expected - so curious what people are seeing in the real world - and of course, we might need to follow the steps here to remove them as well. Thanks, Victor On Thu, 30 Sept 2021 at 16:10, Eugen Block <eblock@xxxxxx> wrote: > Yes, I believe for you it should work without containers although I > haven't tried the migrate command in a non-containerized cluster yet. > But I believe this is a general issue for containerized clusters with > regards to maintenance. I haven't checked yet if there are existing > tracker issues for this, but maybe this should be worth creating one? > > > Zitat von "Szabo, Istvan (Agoda)" <Istvan.Szabo@xxxxxxxxx>: > > > Actually I don't have containerized deployment, my is normal one. So > > it should work the lvm migrate. > > > > Istvan Szabo > > Senior Infrastructure Engineer > > --------------------------------------------------- > > Agoda Services Co., Ltd. > > e: istvan.szabo@xxxxxxxxx > > --------------------------------------------------- > > > > -----Original Message----- > > From: Eugen Block <eblock@xxxxxx> > > Sent: Wednesday, September 29, 2021 8:49 PM > > To: 胡 玮文 <huww98@xxxxxxxxxxx> > > Cc: Igor Fedotov <ifedotov@xxxxxxx>; Szabo, Istvan (Agoda) > > <Istvan.Szabo@xxxxxxxxx>; ceph-users@xxxxxxx > > Subject: Re: is it possible to remove the db+wal from an external > > device (nvme) > > > > Email received from the internet. If in doubt, don't click any link > > nor open any attachment ! > > ________________________________ > > > > That's what I did and pasted the results in my previous comments. > > > > > > Zitat von 胡 玮文 <huww98@xxxxxxxxxxx>: > > > >> Yes. And “cephadm shell” command does not depend on the running > >> daemon, it will start a new container. So I think it is perfectly fine > >> to stop the OSD first then run the “cephadm shell” command, and run > >> ceph-volume in the new shell. > >> > >> 发件人: Eugen Block<mailto:eblock@xxxxxx> > >> 发送时间: 2021年9月29日 21:40 > >> 收件人: 胡 玮文<mailto:huww98@xxxxxxxxxxx> > >> 抄送: Igor Fedotov<mailto:ifedotov@xxxxxxx>; Szabo, Istvan > >> (Agoda)<mailto:Istvan.Szabo@xxxxxxxxx>; > >> ceph-users@xxxxxxx<mailto:ceph-users@xxxxxxx> > >> 主题: Re: is it possible to remove the db+wal from an external device > >> (nvme) > >> > >> The OSD has to be stopped in order to migrate DB/WAL, it can't be done > >> live. ceph-volume requires a lock on the device. > >> > >> > >> Zitat von 胡 玮文 <huww98@xxxxxxxxxxx>: > >> > >>> I’ve not tried it, but how about: > >>> > >>> cephadm shell -n osd.0 > >>> > >>> then run “ceph-volume” commands in the newly opened shell. The > >>> directory structure seems fine. > >>> > >>> $ sudo cephadm shell -n osd.0 > >>> Inferring fsid e88d509a-f6fc-11ea-b25d-a0423f3ac864 > >>> Inferring config > >>> /var/lib/ceph/e88d509a-f6fc-11ea-b25d-a0423f3ac864/osd.0/config > >>> Using recent ceph image > >>> cr.example.com/infra/ceph@sha256:8a0f6f285edcd6488e2c91d3f9fa43534d37 > >>> d7a9b37db1e0ff6691aae6466530 root@host0:/# ll > >>> /var/lib/ceph/osd/ceph-0/ total 68 > >>> drwx------ 2 ceph ceph 4096 Sep 20 04:15 ./ > >>> drwxr-x--- 1 ceph ceph 4096 Sep 29 13:32 ../ > >>> lrwxrwxrwx 1 ceph ceph 24 Sep 20 04:15 block -> > /dev/ceph-hdd/osd.0.data > >>> lrwxrwxrwx 1 ceph ceph 23 Sep 20 04:15 block.db -> > >>> /dev/ubuntu-vg/osd.0.db > >>> -rw------- 1 ceph ceph 37 Sep 20 04:15 ceph_fsid > >>> -rw------- 1 ceph ceph 387 Jun 21 13:24 config > >>> -rw------- 1 ceph ceph 37 Sep 20 04:15 fsid > >>> -rw------- 1 ceph ceph 55 Sep 20 04:15 keyring > >>> -rw------- 1 ceph ceph 6 Sep 20 04:15 ready > >>> -rw------- 1 ceph ceph 3 Apr 2 01:46 require_osd_release > >>> -rw------- 1 ceph ceph 10 Sep 20 04:15 type > >>> -rw------- 1 ceph ceph 38 Sep 17 14:26 unit.configured > >>> -rw------- 1 ceph ceph 48 Nov 9 2020 unit.created > >>> -rw------- 1 ceph ceph 35 Sep 17 14:26 unit.image > >>> -rw------- 1 ceph ceph 306 Sep 17 14:26 unit.meta > >>> -rw------- 1 ceph ceph 1317 Sep 17 14:26 unit.poststop > >>> -rw------- 1 ceph ceph 3021 Sep 17 14:26 unit.run > >>> -rw------- 1 ceph ceph 142 Sep 17 14:26 unit.stop > >>> -rw------- 1 ceph ceph 2 Sep 20 04:15 whoami > >>> > >>> 发件人: Eugen Block<mailto:eblock@xxxxxx> > >>> 发送时间: 2021年9月29日 21:29 > >>> 收件人: Igor Fedotov<mailto:ifedotov@xxxxxxx> > >>> 抄送: 胡 玮文<mailto:huww98@xxxxxxxxxxx>; Szabo, Istvan > >>> (Agoda)<mailto:Istvan.Szabo@xxxxxxxxx>; > >>> ceph-users@xxxxxxx<mailto:ceph-users@xxxxxxx> > >>> 主题: Re: Re: 回复: [ceph-users] Re: is it possible to > >>> remove the db+wal from an external device (nvme) > >>> > >>> Hi Igor, > >>> > >>> thanks for your input. I haven't done this in a prod env yet either, > >>> still playing around in a virtual lab env. > >>> I tried the symlink suggestion but it's not that easy, because it > >>> looks different underneath the ceph directory than ceph-volume > >>> expects it. These are the services underneath: > >>> > >>> ses7-host1:~ # ll /var/lib/ceph/152fd738-01bc-11ec-a7fd-fa163e672db2/ > >>> insgesamt 48 > >>> drwx------ 3 root root 4096 16. Sep 16:11 > alertmanager.ses7-host1 > >>> drwx------ 3 ceph ceph 4096 29. Sep 09:03 crash > >>> drwx------ 2 ceph ceph 4096 16. Sep 16:39 crash.ses7-host1 > >>> drwx------ 4 messagebus lp 4096 16. Sep 16:23 grafana.ses7-host1 > >>> drw-rw---- 2 root root 4096 24. Aug 10:00 home > >>> drwx------ 2 ceph ceph 4096 16. Sep 16:37 mgr.ses7-host1.wmgyit > >>> drwx------ 3 ceph ceph 4096 16. Sep 16:37 mon.ses7-host1 > >>> drwx------ 2 nobody nobody 4096 16. Sep 16:37 > node-exporter.ses7-host1 > >>> drwx------ 2 ceph ceph 4096 29. Sep 08:43 osd.0 > >>> drwx------ 2 ceph ceph 4096 29. Sep 15:11 osd.1 > >>> drwx------ 4 root root 4096 16. Sep 16:12 prometheus.ses7-host1 > >>> > >>> > >>> While the directory in a non-containerized deployment looks like this: > >>> > >>> nautilus:~ # ll /var/lib/ceph/osd/ceph-0/ insgesamt 24 lrwxrwxrwx 1 > >>> ceph ceph 93 29. Sep 12:21 block -> > >>> /dev/ceph-a6d78a29-637f-494b-a839-76251fcff67e/osd-block-39340a48-54b > >>> 3-4689-9896-f54d005c535d > >>> -rw------- 1 ceph ceph 37 29. Sep 12:21 ceph_fsid > >>> -rw------- 1 ceph ceph 37 29. Sep 12:21 fsid > >>> -rw------- 1 ceph ceph 55 29. Sep 12:21 keyring > >>> -rw------- 1 ceph ceph 6 29. Sep 12:21 ready > >>> -rw------- 1 ceph ceph 10 29. Sep 12:21 type > >>> -rw------- 1 ceph ceph 2 29. Sep 12:21 whoami > >>> > >>> > >>> But even if I create the symlink to the osd directory it fails > >>> because I only have ceph-volume within the containers where the > >>> symlink is not visible to cephadm. > >>> > >>> > >>> ses7-host1:~ # ll /var/lib/ceph/osd/ceph-1 lrwxrwxrwx 1 root root 57 > >>> 29. Sep 15:08 /var/lib/ceph/osd/ceph-1 -> > >>> /var/lib/ceph/152fd738-01bc-11ec-a7fd-fa163e672db2/osd.1/ > >>> > >>> ses7-host1:~ # cephadm ceph-volume lvm migrate --osd-id 1 --osd-fsid > >>> b4c772aa-07f8-483d-ae58-0ab97b8d0cc4 --from db --target > >>> ceph-b1ddff4b-95e8-4b91-b451-a3ea35d16ec0/osd-block-b4c772aa-07f8-483 > >>> d-ae58-0ab97b8d0cc4 Inferring fsid > >>> 152fd738-01bc-11ec-a7fd-fa163e672db2 > >>> [...] > >>> /usr/bin/podman: stderr --> Migrate to existing, Source: > >>> ['--devs-source', '/var/lib/ceph/osd/ceph-1/block.db'] Target: > >>> /var/lib/ceph/osd/ceph-1/block > >>> /usr/bin/podman: stderr stdout: inferring bluefs devices from > >>> bluestore path > >>> /usr/bin/podman: stderr stderr: can't migrate > >>> /var/lib/ceph/osd/ceph-1/block.db, not a valid bluefs volume > >>> /usr/bin/podman: stderr --> Failed to migrate device, error code:1 > >>> /usr/bin/podman: stderr --> Undoing lv tag set > >>> /usr/bin/podman: stderr Failed to migrate to : > >>> ceph-b1ddff4b-95e8-4b91-b451-a3ea35d16ec0/osd-block-b4c772aa-07f8-483 > >>> d-ae58-0ab97b8d0cc4 > >>> Traceback (most recent call last): > >>> File "/usr/sbin/cephadm", line 6225, in <module> > >>> r = args.func() > >>> File "/usr/sbin/cephadm", line 1363, in _infer_fsid > >>> return func() > >>> File "/usr/sbin/cephadm", line 1422, in _infer_image > >>> return func() > >>> File "/usr/sbin/cephadm", line 3687, in command_ceph_volume > >>> out, err, code = call_throws(c.run_cmd(), > >>> verbosity=CallVerbosity.VERBOSE) > >>> File "/usr/sbin/cephadm", line 1101, in call_throws > >>> raise RuntimeError('Failed command: %s' % ' '.join(command)) > >>> [...] > >>> > >>> > >>> I could install the package ceph-osd (where ceph-volume is packaged > >>> in) but it's not available by default (as you see this is a SES 7 > >>> environment). > >>> > >>> I'm not sure what the design is here, it feels like the ceph-volume > >>> migrate command is not applicable to containers yet. > >>> > >>> Regards, > >>> Eugen > >>> > >>> > >>> Zitat von Igor Fedotov <ifedotov@xxxxxxx>: > >>> > >>>> Hi Eugen, > >>>> > >>>> indeed this looks like an issue related to containerized deployment, > >>>> "ceph-volume lvm migrate" expects osd folder to be under > >>>> /var/lib/ceph/osd: > >>>> > >>>>> stderr: 2021-09-29T06:56:24.787+0000 7fde05b96180 -1 > >>>>> bluestore(/var/lib/ceph/osd/ceph-1) _lock_fsid failed to lock > >>>>> /var/lib/ceph/osd/ceph-1/fsid (is another ceph-osd still > >>>>> running?)(11) Resource temporarily unavailable > >>>> > >>>> As a workaround you might want to try to create a symlink to your > >>>> actual location before issuing the migrate command: > >>>> /var/lib/ceph/osd -> > >>>> /var/lib/ceph/152fd738-01bc-11ec-a7fd-fa163e672db2/ > >>>> > >>>> More complicated (and more general IMO) way would be to run the > >>>> migrate command from within a container deployed similarly (i.e. > >>>> with all the proper subfolder mappings) to ceph-osd one. Just > >>>> speculating - not a big expert in containers and never tried that > >>>> with properly deployed production cluster... > >>>> > >>>> > >>>> Thanks, > >>>> > >>>> Igor > >>>> > >>>> On 9/29/2021 10:07 AM, Eugen Block wrote: > >>>>> Hi, > >>>>> > >>>>> I just tried with 'ceph-volume lvm migrate' in Octopus but it > >>>>> doesn't really work. I'm not sure if I'm missing something here, > >>>>> but I believe it's again the already discussed containers issue. To > >>>>> be able to run the command for an OSD the OSD has to be offline, > >>>>> but then you don't have access to the block.db because the path is > >>>>> different from outside the container: > >>>>> > >>>>> ---snip--- > >>>>> [ceph: root@host1 /]# ceph-volume lvm migrate --osd-id 1 --osd-fsid > >>>>> b4c772aa-07f8-483d-ae58-0ab97b8d0cc4 --from db --target > >>>>> ceph-b1ddff4b-95e8-4b91-b451-a3ea35d16ec0/osd-block-b4c772aa-07f8-4 > >>>>> 83d-ae58-0ab97b8d0cc4 --> Migrate to existing, Source: > >>>>> ['--devs-source', '/var/lib/ceph/osd/ceph-1/block.db'] > >>>>> Target: > >>>>> /var/lib/ceph/osd/ceph-1/block > >>>>> stdout: inferring bluefs devices from bluestore path > >>>>> stderr: > >>>>> /home/abuild/rpmbuild/BUILD/ceph-15.2.14-84-gb6e5642e260/src/os/blu > >>>>> estore/BlueStore.cc: In function 'int > >>>>> BlueStore::_mount_for_bluefs()' thread > >>>>> 7fde05b96180 > >>>>> time > >>>>> 2021-09-29T06:56:24.790161+0000 > >>>>> stderr: > >>>>> /home/abuild/rpmbuild/BUILD/ceph-15.2.14-84-gb6e5642e260/src/os/blu > >>>>> estore/BlueStore.cc: 6876: FAILED ceph_assert(r == > >>>>> 0) > >>>>> stderr: 2021-09-29T06:56:24.787+0000 7fde05b96180 -1 > >>>>> bluestore(/var/lib/ceph/osd/ceph-1) _lock_fsid failed to lock > >>>>> /var/lib/ceph/osd/ceph-1/fsid (is another ceph-osd still > >>>>> running?)(11) Resource temporarily unavailable > >>>>> > >>>>> > >>>>> # path outside > >>>>> host1:~ # ll > >>>>> /var/lib/ceph/152fd738-01bc-11ec-a7fd-fa163e672db2/osd.1/ > >>>>> insgesamt 60 > >>>>> lrwxrwxrwx 1 ceph ceph 93 29. Sep 08:43 block -> > >>>>> > /dev/ceph-b1ddff4b-95e8-4b91-b451-a3ea35d16ec0/osd-block-b4c772aa-07f8-483d-ae58-0ab97b8d0cc4 > >>>>> lrwxrwxrwx 1 ceph ceph 90 29. Sep 08:43 block.db -> > >>>>> /dev/ceph-6f1b8f49-daf2-4631-a2ef-12e9452b01ea/osd-db-69b11aa0-af96 > >>>>> -443e-8f03-5afa5272131f > >>>>> ---snip--- > >>>>> > >>>>> > >>>>> But if I shutdown the OSD I can't access the block and block.db > >>>>> devices. I'm not even sure how this is supposed to work with > >>>>> cephadm. Maybe I'm misunderstanding, though. Or is there a way to > >>>>> provide the offline block.db path to 'ceph-volume lvm migrate'? > >>>>> > >>>>> > >>>>> > >>>>> Zitat von 胡 玮文 <huww98@xxxxxxxxxxx>: > >>>>> > >>>>>> You may need to use `ceph-volume lvm migrate’ [1] instead of > >>>>>> ceph-bluestore-tool. If I recall correctly, this is a pretty new > >>>>>> feature, I’m not sure whether it is available to your version. > >>>>>> > >>>>>> If you use ceph-bluestore-tool, then you need to modify the LVM > >>>>>> tags manually. Please refer to the previous threads, e.g. [2] and > >>>>>> some more. > >>>>>> > >>>>>> [1]: https://docs.ceph.com/en/latest/man/8/ceph-volume/#migrate > >>>>>> [2]: > >>>>>> https://lists.ceph.io/hyperkitty/list/ceph-users@xxxxxxx/thread/VX > >>>>>> 23NQ66P3PPEX36T3PYYMHPLBSFLMYA/#JLNDFGXR4ZLY27DHD3RJTTZEDHRZJO4Q > >>>>>> > >>>>>> 发件人: Szabo, Istvan (Agoda)<mailto:Istvan.Szabo@xxxxxxxxx> > >>>>>> 发送时间: 2021年9月28日 18:20 > >>>>>> 收件人: Eugen Block<mailto:eblock@xxxxxx>; > >>>>>> ceph-users@xxxxxxx<mailto:ceph-users@xxxxxxx> > >>>>>> 主题: Re: is it possible to remove the db+wal from an > >>>>>> external device (nvme) > >>>>>> > >>>>>> Gave a try of it, so all the 3 osds finally failed :/ Not sure > >>>>>> what went wrong. > >>>>>> > >>>>>> Do the normal maintenance things, ceph osd set noout, ceph osd set > >>>>>> norebalance, stop the osd and run this command: > >>>>>> ceph-bluestore-tool bluefs-bdev-migrate --dev-target > >>>>>> /var/lib/ceph/osd/ceph-0/block --devs-source > >>>>>> /var/lib/ceph/osd/ceph-8/block.db --path /var/lib/ceph/osd/ceph-8/ > >>>>>> Output: > >>>>>> device removed:1 /var/lib/ceph/osd/ceph-8/block.db device added: 1 > >>>>>> /dev/dm-2 > >>>>>> > >>>>>> When tried to start I got this in the log: > >>>>>> osd.8 0 OSD:init: unable to mount object store > >>>>>> ** ERROR: osd init failed: (13) Permission denied set uid:gid to > >>>>>> 167:167 (ceph:ceph) ceph version 15.2.13 > >>>>>> (c44bc49e7a57a87d84dfff2a077a2058aa2172e2) > >>>>>> octopus (stable), process ceph-osd, pid 1512261 > >>>>>> pidfile_write: ignore empty --pid-file > >>>>>> > >>>>>> From the another 2 osds the block.db removed and I can start it > back. > >>>>>> I've zapped the db drive just to be removed from the device > >>>>>> completely and after machine restart none of these 2 osds came > >>>>>> back, I guess missing the db device. > >>>>>> > >>>>>> Is there any steps missing? > >>>>>> 1.Noout+norebalance > >>>>>> 2. Stop osd > >>>>>> 3. migrate with the above command the block.db to the block. > >>>>>> 4. do on the other osds which is sharing the same db device that > >>>>>> want to remove. > >>>>>> 5. zap the db device > >>>>>> 6. start back the osds. > >>>>>> > >>>>>> Istvan Szabo > >>>>>> Senior Infrastructure Engineer > >>>>>> --------------------------------------------------- > >>>>>> Agoda Services Co., Ltd. > >>>>>> e: istvan.szabo@xxxxxxxxx > >>>>>> --------------------------------------------------- > >>>>>> > >>>>>> -----Original Message----- > >>>>>> From: Eugen Block <eblock@xxxxxx> > >>>>>> Sent: Monday, September 27, 2021 7:42 PM > >>>>>> To: ceph-users@xxxxxxx > >>>>>> Subject: Re: is it possible to remove the db+wal from > >>>>>> an external device (nvme) > >>>>>> > >>>>>> Email received from the internet. If in doubt, don't click any > >>>>>> link nor open any attachment ! > >>>>>> ________________________________ > >>>>>> > >>>>>> Hi, > >>>>>> > >>>>>> I think 'ceph-bluestore-tool bluefs-bdev-migrate' could be of use > >>>>>> here. I haven't tried it in a production environment yet, only in > >>>>>> virtual labs. > >>>>>> > >>>>>> Regards, > >>>>>> Eugen > >>>>>> > >>>>>> > >>>>>> Zitat von "Szabo, Istvan (Agoda)" <Istvan.Szabo@xxxxxxxxx>: > >>>>>> > >>>>>>> Hi, > >>>>>>> > >>>>>>> Seems like in our config the nvme device as a wal+db in front of > >>>>>>> the ssd slowing down the ssds osds. > >>>>>>> I'd like to avoid to rebuild all the osd-, is there a way somehow > >>>>>>> migrate to the "slower device" the wal+db without reinstall? > >>>>>>> > >>>>>>> Ty > >>>>>>> _______________________________________________ > >>>>>>> ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send > >>>>>>> an email to ceph-users-leave@xxxxxxx > >>>>>> > >>>>>> > >>>>>> > >>>>>> _______________________________________________ > >>>>>> ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send > >>>>>> an email to ceph-users-leave@xxxxxxx > >>>>>> _______________________________________________ > >>>>>> ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send > >>>>>> an email to ceph-users-leave@xxxxxxx > >>>>> > >>>>> > >>>>> > >>>>> _______________________________________________ > >>>>> ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send > >>>>> an email to ceph-users-leave@xxxxxxx > > > > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx