Re: OSD based ec-code

David Orman <ormandj@xxxxxxxxxxxx> · Tue, 14 Sep 2021 08:54:51 -0500

Keep in mind performance, as well. Once you start getting into higher
'k' values with EC, you've got a lot more drives involved that need to
return completions for operations, and on rotational drives this
becomes especially painful. We use 8+3 for a lot of our purposes, as
it's a good balance of efficiency, durability (number of complete host
failures we can tolerate), and enough performance. It's definitely
significantly slower than something like 4+2 or 3x replicated, though.
It also means we don't deploy clusters below 14 hosts, so we can
tolerate multiple host failures _and still accept writes_. It never
fails that you have a host issue, and while working on that, another
host dies. Same lessons many learn with RAIDs with single drive
redundancy - lose a drive, start a rebuild, another drive fails and
data gone. It's almost always the correct response to err on the side
of durability when it comes to these decisions, unless the data is
unimportant and maximum performance is required.

On Tue, Sep 14, 2021 at 8:20 AM Eugen Block <eblock@xxxxxx> wrote:
>
> Hi,
>
> consider yourself lucky that you haven't had a host failure. But I
> would not draw the wrong conclusions here and change the
> failure-domain based on luck.
> In our production cluster we have an EC pool for archive purposes, it
> all went well for quite some time and last Sunday one of the hosts
> suddenly failed, we're still investigating the root cause. Our
> failure-domain is host and I'm glad that we chose a suitable EC
> profile for that, the cluster is healthy.
>
> > Also what is the "optimal" like 12:3 or ?
>
> You should evaluate that the other way around. What are your specific
> requirements regarding resiliency (how many hosts can fail at the same
> time without data loss)? How many hosts are available? Are you
> planning to expand in the near future? Based on this evaluation you
> can conclude a few options and choose the best for your requirements.
>
> Regards,
> Eugen
>
>
> Zitat von "Szabo, Istvan (Agoda)" <Istvan.Szabo@xxxxxxxxx>:
>
> > Hi,
> >
> > What's your take on an osd based ec-code setup? I've never been
> > brave enough to use OSD based crush rule because scared host failure
> > but in the last 4 years we have never had any host issue so I'm
> > thinking to change to there and use some more cost effective EC.
> >
> > Also what is the "optimal" like 12:3 or ?
> >
> > Thank you
> > _______________________________________________
> > ceph-users mailing list -- ceph-users@xxxxxxx
> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
>
>
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx