Hello,
as we are just trying somthing similar i will jump into this as well:
As our initial setup for the WAL/DB was on too small SSDs we need to
move it back to the HDD and replace the SSD with larger ones (and
after that we decide if we just keep it that way - it an archive
cluster without bit performance demands - and use the SSDs for a
full flash data pool or remigrate the DB to the larger SSDs)
So I just tested how this might go and use ceph-volume:
My steps are inline below:
On 04.02.25 09:52, Eugen Block wrote:
Hi,
in this case I would recommend to use the ceph-bluestore-tool
instead of ceph-volume because it cleans up (removes the block.db
symlink and labels). These are the steps to revert that change:
# before reverting two devices are in use:
soc9-ceph:~ # ceph osd metadata 0 | grep devices
"bluefs_db_devices": "vdf",
"bluestore_bdev_devices": "vdb",
"devices": "vdb,vdf",
"objectstore_numa_unknown_devices": "vdb,vdf",
0. # ceph osd set noout
block + block.db links before:
# ls -l /var/lib/ceph/osd/ceph-1406/
total 60
lrwxrwxrwx 1 ceph ceph 111 Jan 28 17:23 block ->
/dev/mapper/ceph--7c3251f6--3307--4636--92c2--0dd4da36d929-osd--block--8b3d79bc--2f32--46fe--84d4--cea620c5990a
lrwxrwxrwx 1 ceph ceph 108 Jan 28 17:23 block.db ->
/dev/mapper/ceph--bdd19323--8061--4253--a14a--d6d2a3bb13c2-osd--db--471ed787--0eda--4862--8cdc--fb9f1746d4ee
(...)
1. soc9-ceph:~ # ceph orch daemon stop osd.0
1. Here I stoped the osd via systemctl:
# systemctl stop ceph-e776dd57-7fc6-11ee-9f23-9bb83aca7b4b@osd.1406.service
2. soc9-ceph:~ # cephadm shell --name osd.0
2. # cephadm shell --name osd.1406
3. [ceph: root@soc9-ceph /]# ceph-bluestore-tool --path
/var/lib/ceph/osd/ceph-0/ --command bluefs-bdev-migrate
--devs-source /var/lib/ceph/osd/ceph-0/block.db --dev-target
/var/lib/ceph/osd/ceph-0/block
inferring bluefs devices from bluestore path
device removed:1 /var/lib/ceph/osd/ceph-0/block.db
3. Here I used ceph-volume:
# ceph-volume lvm migrate --osd-id 1406 --osd-fsid
8b3d79bc-2f32-46fe-84d4-cea620c5990a --from db wal --target
ceph-7c3251f6-3307-4636-92c2-0dd4da36d929/osd-block-8b3d79bc-2f32-46fe-84d4-cea620c5990a --> Migrate to existing, Source: ['--devs-source', '/var/lib/ceph/osd/ceph-1406/block.db'] Target:
/var/lib/ceph/osd/ceph-1406/block
--> Migration successful.
4. IMPORTANT: Remove the DB LV before you start the OSD, otherwise
the OSD will start to use it again because of the LV tags.
Alternatively, delete the LV tags of the DB LV before starting the
OSD.
4. This I also did not remove - as the 'block.db' link disappearred
from the config dir I assumed it was removed:
# ls -l /var/lib/ceph/osd/ceph-1406/
total 60
lrwxrwxrwx 1 ceph ceph 111 Jan 28 17:23 block ->
/dev/mapper/ceph--7c3251f6--3307--4636--92c2--0dd4da36d929-osd--block--8b3d79bc--2f32--46fe--84d4--cea620c5990a
(..)
5. soc9-ceph:~ # ceph orch daemon start osd.0
5. Exit from container + restart:
# systemctl start ceph-e776dd57-7fc6-11ee-9f23-9bb83aca7b4b@osd.1406.service
# check result
soc9-ceph:~ # ceph osd metadata 0 | grep devices
"bluestore_bdev_devices": "vdb",
"devices": "vdb",
"objectstore_numa_unknown_devices": "vdb",
This I did just now - and it seem that the DB device is still listed there:
# ceph osd metadata 1406 | grep devices
"bluefs_db_devices": "md10",
"bluestore_bdev_devices": "sdy",
"devices": "md10,sdy",
"objectstore_numa_unknown_devices": "md10,sdy",
-> So is this OSD now using the old LVM device without having the
'block.db' link in its config directory?
I will check with another OSD with the ceph-bluestore-tool (which I
actually tried first, but looking at the history I managed to have a
typo in my '---path' which probably caused the error I got (can't
migrate /var/lib/ceph/osd/ceph-1406/block, not a valid bluefs volume
- I managed to have the 'bolck' already in the --path and did not
realize that)
And is there a good way to straighten out the current setup of
osd.1406 from its current state?
Regards,
Holger
Regards,
Eugen
Zitat von Jan Kasprzak <kas@xxxxxxxxxx>:
Hi all,
while reading a sibling thread about moving DB/WAL to a separate device,
I wonder whether is it possible to go the other way round as well,
i.e. to remove a metadata device from an OSD and merge metadata back
to the main storage?
What I am trying to do:
My OSD nodes are 1U boxes with 4 drive bays, two of which support NVMe.
They have two small-ish NVMe drives and two large HDDs.
On NVMe, there is a partition for an OS (configured as RAID-1 across both
NVMe drives), and the rest of each NVMe is used as metadata for one of the
HDD-based OSDs. So I have two OSDs per node.
Now I am considering replacing those small NVMes with much larger ones,
and use part of them for an OS as before, part of them as metadata for the
HDD-based OSDs as before, and the rest as new NVMe-only OSDs.
Can this be done without tearing the original HDD+NVMe-based OSDs down
and recreating them again?
Being able to remove a metadata device from the OSD would help in
this case.
Thanks!
-Yenya
--
| Jan "Yenya" Kasprzak <kas at {fi.muni.cz - work | yenya.net - private}> |
| https://www.fi.muni.cz/~kas/ ; GPG: 4096R/A45477D5 |
We all agree on the necessity of compromise. We just can't agree on
when it's necessary to compromise. --Larry Wall
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
--
Dr. Holger Naundorf
Christian-Albrechts-Universität zu Kiel
Rechenzentrum / HPC / Server und Storage
Tel: +49 431 880-1990
Fax: +49 431 880-1523
naundorf@xxxxxxxxxxxxxx