Re: About erasure code for larger hdd

"Anthony D'Atri" <anthony.datri@xxxxxxxxx> · Mon, 9 Dec 2024 09:13:03 -0500

Agree, 3+2 is the widest you can safely go with 6x failure domains.

As Lukasz touches upon, ultra dense nodes are especially problematic when there are only a few of them.  You will want to attend to mon_osd_down_out_subtree_limit to prevent automated recovery when an entire node is unavailable, otherwise you’ll find your cluster refusing to backfill or even refusing writes.

HDDs are incompatible with consistent throughput.  Whatever you measure initially with an empty cluster, you will never see again, as the drives fill up and become increasingly fragmented.  You will spend a lot of time in rotational and especially seek latency.  There’s also a good chance your HBAs will be saturated, and your SATA interfaces absolutely will be.  It is not uncommon for HDD deployments to cap unit capacity at 8TB because of this.  Figure at most 70 MB/s real world write throughput to a given HDD, and remember that each client write will have to touch 5x drives.  Recovery / backfill will measurably impact your client experience.

These are among the false economies of spinners.

> On Dec 9, 2024, at 8:25 AM, Lukasz Borek <lukasz@xxxxxxxxxxxx> wrote:
> 
> I'd start with 3+2, so you have one node left for recovery in case one
> fails. 6-node and 90 hdd per node sounds like a long recovery that needs to
> be tested for sure.
> 
> On Mon, 9 Dec 2024 at 06:10, Phong Tran Thanh <tranphong079@xxxxxxxxx>
> wrote:
> 
>> Hi community,
>> 
>> Please help with advice on selecting an erasure coding algorithm for a
>> 6-node cluster with 540 OSDs. What would be the appropriate values for *k*
>> and *m*? The cluster requires a high level of HA and consistent
>> throughput.
>> 
>> Email: tranphong079@xxxxxxxxx
>> _______________________________________________
>> ceph-users mailing list -- ceph-users@xxxxxxx
>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>> 
> 
> 
> -- 
> Łukasz Borek
> lukasz@xxxxxxxxxxxx
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx