Re: Optimal Erasure Code profile?

Zakhar Kirpichenko <zakhar@xxxxxxxxx> · Fri, 5 Nov 2021 13:45:11 +0200

Many thanks for your detailed advices, gents, I very much appreciate them!

I read in various places that for production environments it's advised to
keep (k+m) <= host count. Looks like for my setup it is 3+2 then. Would it
be best to proceed with 3+2, or should we go with 4+2?

/Z

On Fri, Nov 5, 2021 at 1:33 PM Sebastian Mazza <sebastian@xxxxxxxxxxx>
wrote:

> Hi Zakhar,
>
> I don't have much experience with Ceph, so you should read my words with
> reasonable skepticism.
>
> If your failure domain should be the host level, then k=4, m=2 is you most
> space efficient option for 6 server that allows you to still do write IO
> when one of the servers failed. Assuming that you want you pool min-size to
> be at least k+1. However, your cluster will not be able to “heal itself” in
> the event of a single server outage, since 5 servers are not enough to
> distribute pgs for “k=4, m=2”.
>
> When you add one more server (7 in total) with enough space your cluster
> will be able to self heal from one server outage. Or you can create a new
> CRUSH rule with “k=5, m=2” and “rebalance” your data with this new rule,
> and get a better space efficiency.
>
> I think for a production setup you should go with a CRUSH rule that
> establishes a simple host based failure domain like you already stated.
> Furthermore, your pool min-size for write IO should be at least k+1, so
> that you are not in immediate danger of losing data after the first
> OSD/Host failure. However, for the sake of completeness I want to mention
> that it is possible to to create CRUSH rules that does not consider Hosts
> and only uses the OSD level as failure domain. This would allow you to use
> erasure coding also with a larger k (e.g. k=6, n=2). Furthermore, it is
> also possible to create CRUSH rules that establishes resilient agains a
> server outage while the pool has “k+m > your server count”.
> See
>         *
> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2019-March/033502.html
>         *
> http://cephnotes.ksperis.com/blog/2017/01/27/erasure-code-on-small-clusters
>
> Example of a CRUSH rule that uses 5 hosts and 2 OSDs per host:
> ---------------
> rule ec5x2hdd {
>         id 123
>         type erasure
>         min_size 10
>         max_size 10
>         step set_chooseleaf_tries 5
>         step set_choose_tries 100
>         step take default class hdd
>         step choose indep 5 type host
>         step choose indep 2 type osd
>         step emit
> }
> This selects 5 servers and uses 2 OSDs on each server. With an erasure
> coded pool of “k=7, m=3” the system could take one failing server or two
> failing HDDs before you loose write IO. The system can also survive one
> Host + one OSD or 3 OSD fails before you loose data. This would give you
> theoretically 70% usable space with only 5 active servers. Instead of 66%
> with the simple Host failure domain on 6 aktive servers and “k=4, m=2”
> erasure coding.
> But I don't advise you to do that!
>
>
> Best
> Sebastian
>
>
> > On 05.11.2021, at 06:14, Zakhar Kirpichenko <zakhar@xxxxxxxxx> wrote:
> >
> > Hi!
> >
> > I've got a CEPH 16.2.6 cluster, the hardware is 6 x Supermicro SSG-6029P
> > nodes, each equipped with:
> >
> > 2 x Intel(R) Xeon(R) Gold 5220R CPUs
> > 384 GB RAM
> > 2 x boot drives
> > 2 x 1.6 TB enterprise NVME drives (DB/WAL)
> > 2 x 6.4 TB enterprise drives (storage tier)
> > 9 x 9TB HDDs (storage tier)
> > 2 x Intel XL710 NICs connected to a pair of 40/100GE switches
> >
> > Please help me understand the calculation / choice of the optimal EC
> > profile for this setup. I would like the EC pool to span all 6 nodes on
> HDD
> > only and have the optimal combination of resiliency and efficiency with
> the
> > view that the cluster will expand. Previously when I had only 3 nodes I
> > tested EC with:
> >
> > crush-device-class=hdd
> > crush-failure-domain=host
> > crush-root=default
> > jerasure-per-chunk-alignment=false
> > k=2
> > m=1
> > plugin=jerasure
> > technique=reed_sol_van
> > w=8
> >
> > I am leaning towards using the above profile with k=4,m=2 for
> "production"
> > use, but am not sure that I understand the math correctly, that this
> > profile is optimal for my current setup, and that I'll be able to scale
> it
> > properly by adding new nodes. I would very much appreciate any advice!
> >
> > Best regards,
> > Zakhar
> > _______________________________________________
> > ceph-users mailing list -- ceph-users@xxxxxxx
> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx