Re: Merge DB/WAL back to the main device?

Eugen Block <eblock@xxxxxx> · Tue, 04 Feb 2025 11:54:38 +0000

Hi Holger,

oh yeah, the noout flag is important of course, I missed that in this  
response, although I mentioned it in the other thread. :-)

-> So is this OSD now using the old LVM device without having the  
'block.db' link in its config directory?

No, the block.db symlink is most likely still present after OSD  
restart, check again:

ls -l /var/lib/ceph/e776dd57-7fc6-11ee-9f23-9bb83aca7b4b/osd.1406/block.db

And is there a good way to straighten out the current setup of  
osd.1406 from its current state?

Try my recommended steps again for that OSD, it should work if you use  
the ceph-bluestore-tool and remove the LV before you start the OSD.

Zitat von Holger Naundorf <naundorf@xxxxxxxxxxxxxx>:

Hello,
as we are just trying somthing similar i will jump into this as well:
As our initial setup for the WAL/DB was on too small SSDs we need to  
move it back to the HDD and replace the SSD with larger ones (and  
after that we decide if we just keep it that way - it an archive  
cluster without bit performance demands - and use the SSDs for a  
full flash data pool or remigrate the DB to the larger SSDs)

So I just tested how this might go and use ceph-volume:

My steps are inline below:

On 04.02.25 09:52, Eugen Block wrote:
Hi,

in this case I would recommend to use the ceph-bluestore-tool  
instead of ceph-volume because it cleans up (removes the block.db  
symlink and labels). These are the steps to revert that change:

# before reverting two devices are in use:

soc9-ceph:~ # ceph osd metadata 0 | grep devices
    "bluefs_db_devices": "vdf",
    "bluestore_bdev_devices": "vdb",
    "devices": "vdb,vdf",
    "objectstore_numa_unknown_devices": "vdb,vdf",

0. # ceph osd set noout

block + block.db links before:

# ls -l /var/lib/ceph/osd/ceph-1406/
total 60
lrwxrwxrwx 1 ceph ceph  111 Jan 28 17:23 block ->  
/dev/mapper/ceph--7c3251f6--3307--4636--92c2--0dd4da36d929-osd--block--8b3d79bc--2f32--46fe--84d4--cea620c5990a

lrwxrwxrwx 1 ceph ceph  108 Jan 28 17:23 block.db ->  
/dev/mapper/ceph--bdd19323--8061--4253--a14a--d6d2a3bb13c2-osd--db--471ed787--0eda--4862--8cdc--fb9f1746d4ee

(...)

1. soc9-ceph:~ # ceph orch daemon stop osd.0

1.  Here I stoped the osd via systemctl:

# systemctl stop ceph-e776dd57-7fc6-11ee-9f23-9bb83aca7b4b@osd.1406.service

2. soc9-ceph:~ # cephadm shell --name osd.0

2. # cephadm shell --name osd.1406

3. [ceph: root@soc9-ceph /]# ceph-bluestore-tool --path  
/var/lib/ceph/osd/ceph-0/ --command bluefs-bdev-migrate  
--devs-source /var/lib/ceph/osd/ceph-0/block.db --dev-target  
/var/lib/ceph/osd/ceph-0/block
inferring bluefs devices from bluestore path
 device removed:1 /var/lib/ceph/osd/ceph-0/block.db

3. Here I used ceph-volume:

# ceph-volume lvm migrate --osd-id 1406 --osd-fsid  
8b3d79bc-2f32-46fe-84d4-cea620c5990a --from db wal --target  
ceph-7c3251f6-3307-4636-92c2-0dd4da36d929/osd-block-8b3d79bc-2f32-46fe-84d4-cea620c5990a --> Migrate to existing, Source: ['--devs-source', '/var/lib/ceph/osd/ceph-1406/block.db'] Target:  
/var/lib/ceph/osd/ceph-1406/block
--> Migration successful.

4. IMPORTANT: Remove the DB LV before you start the OSD, otherwise  
the OSD will start to use it again because of the LV tags.  
Alternatively, delete the LV tags of the DB LV before starting the  
OSD.

4. This I also did not remove - as the 'block.db' link disappearred  
from the config dir I assumed it was removed:

# ls -l /var/lib/ceph/osd/ceph-1406/
total 60
lrwxrwxrwx 1 ceph ceph  111 Jan 28 17:23 block ->  
/dev/mapper/ceph--7c3251f6--3307--4636--92c2--0dd4da36d929-osd--block--8b3d79bc--2f32--46fe--84d4--cea620c5990a
(..)

5. soc9-ceph:~ # ceph orch daemon start osd.0

5. Exit from container + restart:

# systemctl start ceph-e776dd57-7fc6-11ee-9f23-9bb83aca7b4b@osd.1406.service

# check result

soc9-ceph:~ # ceph osd metadata 0 | grep devices
    "bluestore_bdev_devices": "vdb",
    "devices": "vdb",
    "objectstore_numa_unknown_devices": "vdb",

This I did just now - and it seem that the DB device is still listed there:

# ceph osd metadata 1406 | grep devices
    "bluefs_db_devices": "md10",
    "bluestore_bdev_devices": "sdy",
    "devices": "md10,sdy",
    "objectstore_numa_unknown_devices": "md10,sdy",

-> So is this OSD now using the old LVM device without having the  
'block.db' link in its config directory?

I will check with another OSD with the ceph-bluestore-tool (which I  
actually tried first, but looking at the history I managed to have a  
typo in my '---path' which probably caused the error I got (can't  
migrate /var/lib/ceph/osd/ceph-1406/block, not a valid bluefs volume  
- I managed to have the 'bolck' already in the --path and did not  
realize that)

And is there a good way to straighten out the current setup of  
osd.1406 from its current state?

Regards,
Holger

Regards,
Eugen

Zitat von Jan Kasprzak <kas@xxxxxxxxxx>:

Hi all,

while reading a sibling thread about moving DB/WAL to a separate device,
I wonder whether is it possible to go the other way round as well,
i.e. to remove a metadata device from an OSD and merge metadata back
to the main storage?

What I am trying to do:

My OSD nodes are 1U boxes with 4 drive bays, two of which support NVMe.
They have two small-ish NVMe drives and two large HDDs.
On NVMe, there is a partition for an OS (configured as RAID-1 across both
NVMe drives), and the rest of each NVMe is used as metadata for one of the
HDD-based OSDs. So I have two OSDs per node.

Now I am considering replacing those small NVMes with much larger ones,
and use part of them for an OS as before, part of them as metadata for the
HDD-based OSDs as before, and the rest as new NVMe-only OSDs.
Can this be done without tearing the original HDD+NVMe-based OSDs down
and recreating them again?

Being able to remove a metadata device from the OSD would help in  
this case.

Thanks!

-Yenya

--
| Jan "Yenya" Kasprzak <kas at {fi.muni.cz - work | yenya.net - private}> |
| https://www.fi.muni.cz/~kas/                       ; GPG: 4096R/A45477D5 |
    We all agree on the necessity of compromise. We just can't agree on
    when it's necessary to compromise.                     --Larry Wall
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

--
Dr. Holger Naundorf
Christian-Albrechts-Universität zu Kiel
Rechenzentrum / HPC / Server und Storage
Tel: +49 431 880-1990
Fax:  +49 431 880-1523
naundorf@xxxxxxxxxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx