Re: Erasure coding best practice

"Anthony D'Atri" <aad@xxxxxxxxxxxxxx> · Tue, 17 Dec 2024 09:36:10 -0500

> To be honest with 3:8 we could protect the cluster more from osd flapping.
> Let's say you have less chance to have 8 down pgs on 8 separate nodes then with 8:3 only 3pgs on 3 nodes.
> Of course this comes with the cost on storage used.
> Is there any disadvantage performance wise on this?

A few years back someone asserted that EC values with small prime factors are advantageous, so 23,11 would be doubleplus ungood.

In general EC comes with a write tradeoff:  each write (today) touches k+m hosts/OSDs, which means increased network amp and write IOPs amp.  The latter can be painful for rotational OSDs.  This additionally means that recovery/backfill are slower, as more OSDs are tied up with each op and thus fewer PGs can recover in parallel.

>> I'm trying to understand what is the benefit of the higher coding chunks? You can use smaller object size?
>> Let's say 4:2 the minimum object size should be 24K at least or with 8:3  it would be 44K because nothing will be stored on smaller space.
>> In case of k=3 m=8, smallest object can be 12K, but you can lose 8 nodes (pgs) and data still there?

BlueStore will allocate no less than min_alloc_size on a given OSD; the default is 4KB for most media.  I think the above implies that parity shards don’t take space, but in fact an m shard uses space just like a k shard, so for space efficiency what matters is the sum of k+m, not their specific values.

https://docs.google.com/spreadsheets/d/1rpGfScgG-GLoIGMJWDixEkqs-On9w8nAUToPQjN8bDI/edit?gid=358760253#gid=358760253;
Bluestore Space Amplification Cheat Sheet
docs.google.com

>> 
>> Ty
>> _______________________________________________
>> ceph-users mailing list -- ceph-users@xxxxxxx
>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
> 
> 
> 
> --
> Alexander Patrakov
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx