Re: How to deal with increasing HDD sizes ? 1 OSD for 2 LVM-packed HDDs ?

Renaud Jean Christophe Miel <renaud.miel@xxxxxxxxx> · Thu, 19 Oct 2023 00:39:36 +0000

Thank you for your feedback.

We have a failure domain of "node".

The question here is a rather simple one:
when you add to an existing Ceph cluster a new node having disks twice (12TB) the size of the existing disks (6TB), how do you let Ceph evenly distribute the data across all disks ?

You mentioned CRUSH:  would the creation of a new "12TB-virtual-disk" CRUSH hierarchy level do the trick ?

In this level you would either pick 1 12TB HDD on the new node, or a pair of 2 6TB HDDs on an old node.

Has someone already experimented with such kind of CRUSH hierarchy ?

Regards,

Renaud Miel
NAOJ

________________________________
From: Anthony D'Atri <anthony.datri@xxxxxxxxx>
Sent: Thursday, October 19, 2023 00:59
To: Robert Sander <r.sander@xxxxxxxxxxxxxxxxxxx>
Cc: ceph-users@xxxxxxx <ceph-users@xxxxxxx>
Subject:  Re: How to deal with increasing HDD sizes ? 1 OSD for 2 LVM-packed HDDs ?

This is one of many reasons for not using HDDs ;)

One nuance that is easy overlooked is the CRUSH weight of failure domains.

If, say, you have a failure domain of "rack" with size=3 replicated pools and 3x CRUSH racks, if you add the new, larger OSDs to only one rack, you will not increase the cluster's capacity.

If in this scenario you add them as a fourth rack, this is mostly obviated.  Another strategy is to add them uniformly to the existing racks.

The larger OSDs will get more PGs as Herr Sander touches upon.
* The higher IO can be somewhat ameliorated by adjusting primary-affinity values to favor the smaller OSDs being primaries for given PGs.
* The larger OSDs will have an increased risk of running into the mon_max_pg_per_osd limit, especially when an OSD or host fails.  Ensure that this setting is high enough to avoid this, suggest 500 as a value.

> On Oct 18, 2023, at 04:05, Robert Sander <r.sander@xxxxxxxxxxxxxxxxxxx> wrote:
>
> On 10/18/23 09:25, Renaud Jean Christophe Miel wrote:
>> Hi,
>> Use case:
>> * Ceph cluster with old nodes having 6TB HDDs
>> * Add new node with new 12TB HDDs
>> Is it supported/recommended to pack 2 6TB HDDs handled by 2 old OSDs
>> into 1 12TB LVM disk handled by 1 new OSD ?
>
> The 12 TB HDD will get double the IO than one of the 6 TB HDDs.
> But it will still only be able to handle about 120 IOPs.
> This makes the newer larger HDDs a bottleneck when run in the same pool.
>
> If you are not planning to decommission the smaller HDDs it is recommended to use the larger ones in a separate pool for performance reasons.
>
> Regards
> --
> Robert Sander
> Heinlein Consulting GmbH
> Schwedter Str. 8/9b, 10119 Berlin
>
> https://www.heinlein-support.de
>
> Tel: 030 / 405051-43
> Fax: 030 / 405051-19
>
> Amtsgericht Berlin-Charlottenburg - HRB 220009 B
> Geschäftsführer: Peer Heinlein - Sitz: Berlin
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx