Re: Erasure coded pool chunk count k

"Anthony D'Atri" <anthony.datri@xxxxxxxxx> · Mon, 4 Oct 2021 18:57:36 -0700

The larger the value of K relative to M, the more efficient the raw :: usable ratio ends up.

There are tradeoffs and caveats.  Here are some of my thoughts; if I’m off-base here, I welcome enlightenment. 

When possible, it’s ideal to have at least K+M failure domains — often racks, sometimes hosts, chassis, etc.  Thus smaller clusters, say with 5-6 nodes, aren’t good fits for larger sums of K+M if your data is valuable.

Larger sums of K+M also mean that more drives will be touched by each read or write, especially during recovery.  This could be a factor if one is IOPS-limited.  Same with scrubs.

When using a pool for, eg. RGW buckets, larger sums of K+M may result in greater overhead when storing small objects, since Ceph / RGW only AIUI writes full stripes.  So say you have an EC pool of 17,3 on drives with the default 4kB bluestone_min_alloc size.  A 1kB S3 object would thus allocate 17+3=20 x 4kB == 80kB of storage, which is 7900% overhead.  This is an extreme example to illustrate the point.

Larger sums of K+M may present more IOPs to each storage drive, dependent on workload and the EC plugin selected.

With larger objects (including RBD) the modulo factor is dramatically smaller.  One’s use-case and dataset per-pool may thus inform the EC profiles that make sense; workloads that are predominately smaller objects might opt for replication instead.

There was a post ….. a year ago? suggesting that values with small prime factors are advantageous, but I never saw a discussion of why that might be.

In some cases where one might be pressured to use replication with only 2 copies of data, a 2,2 EC profile might achieve the same efficiency with greater safety.

Geo / stretch clusters or ones in challenging environments are a special case; they might choose values of M equal to or even larger than K.

That said, I think 4,2 is a reasonable place to *start*, adjusted by one’s specific needs.  You get a raw :: usable ratio of 1.5 without getting too complicated.

ymmv

> 
> Hi,
> 
> It depends of hardware, failure domain, use case, overhead.
> 
> I don’t see an easy way to chose k and m values.
> 
> -
> Etienne Menguy
> etienne.menguy@xxxxxxxx
> 
> 
>> On 4 Oct 2021, at 16:57, Golasowski Martin <martin.golasowski@xxxxxx> wrote:
>> 
>> Hello guys,
>> how does one estimate number of chunks for erasure coded pool ( k = ? ) ? I see that number of m chunks determines the pool’s resiliency, however I did not find clear guideline how to determine k.
>> 
>> Red Hat states that they support only the following combinations:
>> 
>> k=8, m=3
>> k=8, m=4
>> k=4, m=2
>> 
>> without any rationale behind them.
>> The table is taken from https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/3/html/storage_strategies_guide/erasure_code_pools.
>> 
>> Thanks!
>> 
>> Regards,
>> Martin
>> 
>> 
>> _______________________________________________
>> ceph-users mailing list -- ceph-users@xxxxxxx
>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
> 
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx