Hi Zakhar, I don't have much experience with Ceph, so you should read my words with reasonable skepticism. If your failure domain should be the host level, then k=4, m=2 is you most space efficient option for 6 server that allows you to still do write IO when one of the servers failed. Assuming that you want you pool min-size to be at least k+1. However, your cluster will not be able to “heal itself” in the event of a single server outage, since 5 servers are not enough to distribute pgs for “k=4, m=2”. When you add one more server (7 in total) with enough space your cluster will be able to self heal from one server outage. Or you can create a new CRUSH rule with “k=5, m=2” and “rebalance” your data with this new rule, and get a better space efficiency. I think for a production setup you should go with a CRUSH rule that establishes a simple host based failure domain like you already stated. Furthermore, your pool min-size for write IO should be at least k+1, so that you are not in immediate danger of losing data after the first OSD/Host failure. However, for the sake of completeness I want to mention that it is possible to to create CRUSH rules that does not consider Hosts and only uses the OSD level as failure domain. This would allow you to use erasure coding also with a larger k (e.g. k=6, n=2). Furthermore, it is also possible to create CRUSH rules that establishes resilient agains a server outage while the pool has “k+m > your server count”. See * http://lists.ceph.com/pipermail/ceph-users-ceph.com/2019-March/033502.html * http://cephnotes.ksperis.com/blog/2017/01/27/erasure-code-on-small-clusters Example of a CRUSH rule that uses 5 hosts and 2 OSDs per host: --------------- rule ec5x2hdd { id 123 type erasure min_size 10 max_size 10 step set_chooseleaf_tries 5 step set_choose_tries 100 step take default class hdd step choose indep 5 type host step choose indep 2 type osd step emit } This selects 5 servers and uses 2 OSDs on each server. With an erasure coded pool of “k=7, m=3” the system could take one failing server or two failing HDDs before you loose write IO. The system can also survive one Host + one OSD or 3 OSD fails before you loose data. This would give you theoretically 70% usable space with only 5 active servers. Instead of 66% with the simple Host failure domain on 6 aktive servers and “k=4, m=2” erasure coding. But I don't advise you to do that! Best Sebastian > On 05.11.2021, at 06:14, Zakhar Kirpichenko <zakhar@xxxxxxxxx> wrote: > > Hi! > > I've got a CEPH 16.2.6 cluster, the hardware is 6 x Supermicro SSG-6029P > nodes, each equipped with: > > 2 x Intel(R) Xeon(R) Gold 5220R CPUs > 384 GB RAM > 2 x boot drives > 2 x 1.6 TB enterprise NVME drives (DB/WAL) > 2 x 6.4 TB enterprise drives (storage tier) > 9 x 9TB HDDs (storage tier) > 2 x Intel XL710 NICs connected to a pair of 40/100GE switches > > Please help me understand the calculation / choice of the optimal EC > profile for this setup. I would like the EC pool to span all 6 nodes on HDD > only and have the optimal combination of resiliency and efficiency with the > view that the cluster will expand. Previously when I had only 3 nodes I > tested EC with: > > crush-device-class=hdd > crush-failure-domain=host > crush-root=default > jerasure-per-chunk-alignment=false > k=2 > m=1 > plugin=jerasure > technique=reed_sol_van > w=8 > > I am leaning towards using the above profile with k=4,m=2 for "production" > use, but am not sure that I understand the math correctly, that this > profile is optimal for my current setup, and that I'll be able to scale it > properly by adding new nodes. I would very much appreciate any advice! > > Best regards, > Zakhar > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx