Thanks! I'll stick to 3:2 for now then. /Z On Fri, Nov 5, 2021 at 1:55 PM Szabo, Istvan (Agoda) <Istvan.Szabo@xxxxxxxxx> wrote: > With 6 servers I'd go with 3:2, with 7 can go with 4:2. > > > Istvan Szabo > Senior Infrastructure Engineer > --------------------------------------------------- > Agoda Services Co., Ltd. > e: istvan.szabo@xxxxxxxxx > --------------------------------------------------- > > -----Original Message----- > From: Zakhar Kirpichenko <zakhar@xxxxxxxxx> > Sent: Friday, November 5, 2021 6:45 PM > To: ceph-users <ceph-users@xxxxxxx> > Subject: Re: Optimal Erasure Code profile? > > Email received from the internet. If in doubt, don't click any link nor > open any attachment ! > ________________________________ > > Many thanks for your detailed advices, gents, I very much appreciate them! > > I read in various places that for production environments it's advised to > keep (k+m) <= host count. Looks like for my setup it is 3+2 then. Would it > be best to proceed with 3+2, or should we go with 4+2? > > /Z > > On Fri, Nov 5, 2021 at 1:33 PM Sebastian Mazza <sebastian@xxxxxxxxxxx> > wrote: > > > Hi Zakhar, > > > > I don't have much experience with Ceph, so you should read my words > > with reasonable skepticism. > > > > If your failure domain should be the host level, then k=4, m=2 is you > > most space efficient option for 6 server that allows you to still do > > write IO when one of the servers failed. Assuming that you want you > > pool min-size to be at least k+1. However, your cluster will not be > > able to “heal itself” in the event of a single server outage, since 5 > > servers are not enough to distribute pgs for “k=4, m=2”. > > > > When you add one more server (7 in total) with enough space your > > cluster will be able to self heal from one server outage. Or you can > > create a new CRUSH rule with “k=5, m=2” and “rebalance” your data with > > this new rule, and get a better space efficiency. > > > > I think for a production setup you should go with a CRUSH rule that > > establishes a simple host based failure domain like you already stated. > > Furthermore, your pool min-size for write IO should be at least k+1, > > so that you are not in immediate danger of losing data after the first > > OSD/Host failure. However, for the sake of completeness I want to > > mention that it is possible to to create CRUSH rules that does not > > consider Hosts and only uses the OSD level as failure domain. This > > would allow you to use erasure coding also with a larger k (e.g. k=6, > > n=2). Furthermore, it is also possible to create CRUSH rules that > > establishes resilient agains a server outage while the pool has “k+m > > your server count”. > > See > > * > > > http://lists.ceph.com/pipermail/ceph-users-ceph.com/2019-March/033502.html > > * > > http://cephnotes.ksperis.com/blog/2017/01/27/erasure-code-on-small-clu > > sters > > > > Example of a CRUSH rule that uses 5 hosts and 2 OSDs per host: > > --------------- > > rule ec5x2hdd { > > id 123 > > type erasure > > min_size 10 > > max_size 10 > > step set_chooseleaf_tries 5 > > step set_choose_tries 100 > > step take default class hdd > > step choose indep 5 type host > > step choose indep 2 type osd > > step emit > > } > > This selects 5 servers and uses 2 OSDs on each server. With an erasure > > coded pool of “k=7, m=3” the system could take one failing server or > > two failing HDDs before you loose write IO. The system can also > > survive one Host + one OSD or 3 OSD fails before you loose data. This > > would give you theoretically 70% usable space with only 5 active > > servers. Instead of 66% with the simple Host failure domain on 6 aktive > servers and “k=4, m=2” > > erasure coding. > > But I don't advise you to do that! > > > > > > Best > > Sebastian > > > > > > > On 05.11.2021, at 06:14, Zakhar Kirpichenko <zakhar@xxxxxxxxx> wrote: > > > > > > Hi! > > > > > > I've got a CEPH 16.2.6 cluster, the hardware is 6 x Supermicro > > > SSG-6029P nodes, each equipped with: > > > > > > 2 x Intel(R) Xeon(R) Gold 5220R CPUs > > > 384 GB RAM > > > 2 x boot drives > > > 2 x 1.6 TB enterprise NVME drives (DB/WAL) > > > 2 x 6.4 TB enterprise drives (storage tier) > > > 9 x 9TB HDDs (storage tier) > > > 2 x Intel XL710 NICs connected to a pair of 40/100GE switches > > > > > > Please help me understand the calculation / choice of the optimal EC > > > profile for this setup. I would like the EC pool to span all 6 nodes > > > on > > HDD > > > only and have the optimal combination of resiliency and efficiency > > > with > > the > > > view that the cluster will expand. Previously when I had only 3 > > > nodes I tested EC with: > > > > > > crush-device-class=hdd > > > crush-failure-domain=host > > > crush-root=default > > > jerasure-per-chunk-alignment=false > > > k=2 > > > m=1 > > > plugin=jerasure > > > technique=reed_sol_van > > > w=8 > > > > > > I am leaning towards using the above profile with k=4,m=2 for > > "production" > > > use, but am not sure that I understand the math correctly, that this > > > profile is optimal for my current setup, and that I'll be able to > > > scale > > it > > > properly by adding new nodes. I would very much appreciate any advice! > > > > > > Best regards, > > > Zakhar > > > _______________________________________________ > > > ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an > > > email to ceph-users-leave@xxxxxxx > > > > > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an > email to ceph-users-leave@xxxxxxx > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx