Re: Question about speeding hdd based cluster

Eugen Block <eblock@xxxxxx> · Tue, 08 Oct 2024 08:26:01 +0000

Sure:

https://docs.ceph.com/en/latest/ceph-volume/lvm/newdb/

In this case you'll have to prepare the db LV beforehand. I haven't  
done that in a while, here's an example from Clyso:

https://docs.clyso.com/blog/ceph-volume-create-wal-db-on-separate-device-for-existing-osd

Note that in a cephadm deployment you'll need to execute that in a  
shell, for example:

cephadm shell --name osd.6  --env  
CEPH_ARGS='--bluestore_block_db_size=1341967564' --  
ceph-bluestore-tool bluefs-bdev-new-db --dev-target /dev/data_vg1/lv4  
--path /var/lib/ceph/osd/ceph-6

Note that these are two different approaches to achieve the same goal.  
One is via 'ceph-volume lvm new-db', the other one with  
'ceph-bluestore-tool bluefs-bdev-new-db'. I would assume they both  
work, so I can't tell which one to prefer. I feel like the docs could  
use some clarification on this topic.

On a similar topic: Does it make sense to use compression on a  
metadata pool?  Would it matter if the metadata pool is on hdd vs ssd?

As already stated, metadata should be on fast devices, independent of  
compression. The metadata pool doesn't consume a lot of data, so I'd  
say there's not too much of a benefit compressing that.

Zitat von "Kyriazis, George" <george.kyriazis@xxxxxxxxx>:

On Oct 7, 2024, at 2:16 AM, Eugen Block <eblock@xxxxxx> wrote:

Hi, response inline.

Zitat von "Kyriazis, George" <george.kyriazis@xxxxxxxxx>:

Thank you all.

The cluster is used mostly for backup of large files currently,  
but we are hoping to use it for home directories (compiles, etc.)  
soon.  Most usage would be for large files, though.

What I've observed with its current usage is that ceph rebalances,  
and proxmox-initiated VM backups bring the storage to its knees.

Would a safe approach be to move the metadata pool to ssd first,  
see how it goes (since it would be cheaper), and then add DB/WAL  
disks?

Moving the metadata to SSDs first is absolutely reasonable and  
relatively cheap since it usually doesn't contain huge amounts of  
data.

How would ceph behave if we are adding DB/WAL disks "slowly" (ie  
one node at a time)?  We have about 100 OSDs (mix hdd/ssd) spread  
across about 25 hosts.  Hosts are server-grade with plenty of  
memory and processing power.

The answer is as always "it depends". If you rebuild the OSDs  
entirely (host-wise) instead of migrating the DB off to SSDs, you  
might encounter slow requests as you already noticed yourself. But  
the whole process would be faster than migrating each DB  
individually.
If you take the migration approach, it would be less invasive, each  
OSD would just have to catch up after restart, reducing the load  
drastically compared to a rebuild. But then again, it would take  
way more time to complete. How large are the OSDs and how much are  
they utilized? Do you have some history how long a host rebuild  
would usually take?

I have no problem destroying and re-creating the OSDs (in place) if  
that’s what it takes.  It will take time to do them all, but if  
“eventually” it works better, then so be it.  Do you happen to have  
a documentation pointer no how to migrate DB to SSDs?

On a similar topic: Does it make sense to use compression on a  
metadata pool?  Would it matter if the metadata pool is on hdd vs ssd?

Thank you!

George

Thank you!

George

-----Original Message-----
From: Eugen Block <eblock@xxxxxx>
Sent: Wednesday, October 2, 2024 2:18 AM
To: ceph-users@xxxxxxx
Subject:  Re: Question about speeding hdd based cluster

Hi George,

the docs [0] strongly recommend to have dedicated SSD or NVMe OSDs for
the metadata pool. You'll also benefit from dedicated DB/WAL devices.
But as Joachim already stated, it depends on a couple of factors like the
number of clients, the load they produce, file sizes etc. There's  
no easy answer.

Regards,
Eugen

[0] https://docs.ceph.com/en/latest/cephfs/createfs/#creating-pools

Zitat von Joachim Kraftmayer <joachim.kraftmayer@xxxxxxxxx>:

> Hi Kyriazis,
>
> depends on the workload.
> I would recommend to add  ssd/nvme DB/WAL to each osd.
>
>
>
> Joachim Kraftmayer
>
> www.clyso.com
>
> Hohenzollernstr. 27, 80801 Munich
>
> Utting a. A. | HR: Augsburg | HRB: 25866 | USt. ID-Nr.: DE2754306
>
> Kyriazis, George <george.kyriazis@xxxxxxxxx> schrieb am Mi., 2. Okt.
> 2024,
> 07:37:
>
>> Hello ceph-users,
>>
>> I’ve been wondering…. I have a proxmox hdd-based cephfs pool with no
>> DB/WAL drives.  I also have ssd drives in this setup used for  
other pools.
>>
>> What would increase the speed of the hdd-based cephfs more, and in
>> what usage scenarios:
>>
>> 1. Adding ssd/nvme DB/WAL drives for each node 2. Moving the metadata
>> pool for my cephfs to ssd 3. Increasing the performance of the
>> network.  I currently have 10gbe links.
>>
>> It doesn’t look like the network is currently saturated, so I’m
>> thinking
>> (3) is not a solution.  However, if I choose any of the other
>> options, would I need to also upgrade the network so that the network
>> does not become a bottleneck?
>>
>> Thank you!
>>
>> George
>>
>> _______________________________________________
>> ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an
>> email to ceph-users-leave@xxxxxxx
>>
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an
> email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send  
an email to
ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx