Re: [EXTERNAL] Custom CRUSH maps HOWTO?

"Beaman, Joshua" <Joshua_Beaman@xxxxxxxxxxx> · Tue, 30 May 2023 16:48:48 +0000

I’m going to start by assuming your pool(s) are deployed with the default 3 replicas and a min_size of 2.

The quickest and safest thing you can do to potentially realize some improvement, is set the primary-affinity for all of your HDD-based OSDs to zero.
https://docs.ceph.com/en/quincy/rados/operations/crush-map/#primary-affinity

Something like:
for osd in $(ceph osd ls-tree SAS-NODE1); do ceph osd primary-affinity $osd 0.0; done

And of course repeat that for the other node.

That will have low impact on your users as ceph will start prioritizing reads from the fast NVMEs, and the slow ones will only have to do writes.  However, ceph may already be doing that, and if your SAS based hosts do not have a fast disks for the block DB and WAL (write-ahead log), any time 2 (or more) SAS disks are involved in a PG, your writes will still be as slow as the fastest HDD.

It is best when ceph has identical size and performance OSDs.  When you’re going to mix very fast disks, with relatively slow disks the next best thing is to have twice as much fast storage as slow.  If you have enough capacity available such that the total data STORED (add up from ceph df) is < 3.84*4*2*0.7 = ~21.5TB, I’d suggest creating rack buckets in your crush map, so there’s 3 racks, each with 2 hosts, so that each PG will only have one slow disk.  The down side to that is, you are basically abandoning ~50TB of HDD capacity, your effective maximum RAW capacity ends up only ~92TB, and you’ll start getting near-full warnings between 75 and 80TB RAW or around 25-27TB stored.

The process for setting that would be adding 3 rack buckets, and then moving the host buckets into the rack buckets:
https://docs.ceph.com/en/quincy/rados/operations/crush-map/#add-a-bucket

That will cause a lot of data movement, so you should try to do it at a time when client i/o is expected to be low.  Ceph will do its best to limit the impact to client i/o caused by this backfill, but if your writes are already poor, they’ll definitely be worse during the movement.

If that capacity is going to be an issue, the recommended fixes get more complicated and risky.  However, the best thing you can do, even if you do add the suggested racks to your crush map, would be to get 2 NVMEs (or SSDs) for each of your SAS hosts to serve as db_devices for the HDDs.  You’ll have to remove and recreate those OSDs, but you can do them in smaller batches.
https://docs.ceph.com/en/quincy/cephadm/services/osd/#creating-new-osds

There is a GUI ceph dashboard available.  https://docs.ceph.com/en/quincy/mgr/dashboard/
It is very limited in the changes that can be made, and these types of crush map changes are definitely not for the dashboard.  But it may help you get a useful view of the state of your cluster.

Best of luck,
Josh Beaman

From: Thorne Lawler <thorne@xxxxxxxxxxx>
Date: Tuesday, May 30, 2023 at 9:52 AM
To: ceph-users@xxxxxxx <ceph-users@xxxxxxx>
Subject: [EXTERNAL]  Custom CRUSH maps HOWTO?
Hi folks!

I have a Ceph production 17.2.6 cluster with 6 machines in it - four
newer, faster machines with 4x3.84TB NVME drives each, and two with
24x1.68TB SAS disks each.

I know I should have done something smart with the CRUSH maps for this
up front, but until now I have shied away from CRUSH maps as they sound
really complex.

Right now my cluster's performance, especially write performance, is not
what it needs to be, and I am looking for advice:

1. How should I be structuring my crush map, and why?

2. How does one actually edit and manage a CRUSH map? What /commands/
does one use? This isn't clear at all in the documentation. Are there
any GUI tools out there for managing CRUSH?

3. Is this going to impact production performance or availability while
I'm configuring it? I have tens of thousands of users relying on this
thing, so I can't take any risks.

Thanks in advance!

--

Regards,

Thorne Lawler - Senior System Administrator
*DDNS* | ABN 76 088 607 265
First registrar certified ISO 27001-2013 Data Security Standard ITGOV40172
P +61 499 449 170

_DDNS

/_*Please note:* The information contained in this email message and any
attached files may be confidential information, and may also be the
subject of legal professional privilege. _If you are not the intended
recipient any use, disclosure or copying of this email is unauthorised.
_If you received this email in error, please notify Discount Domain Name
Services Pty Ltd on 03 9815 6868 to report this matter and delete all
copies of this transmission together with any attachments. /
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx