Re: fault tolerant about erasure code pool

Janne Johansson <icepic.dz@xxxxxxxxx> · Fri, 26 Jun 2020 10:59:30 +0200

Den fre 26 juni 2020 kl 10:32 skrev Zhenshi Zhou <deaderzzs@xxxxxxxxx>:

> Hi all,
>
> I'm going to deploy a cluster with erasure code pool for cold storage.
> There are 3 servers for me to set up the cluster, 12 OSDs on each server.
> Does that mean the data is secure while 1/3 OSDs of the cluster is down,
> or only 2 of the OSDs is down , if I set the ec profile with k=4 and m=2.
>

By default, crush will want to place each part (of 6 in your case for EC
4+2)
on a host of its own, to maximize data safety. Since you can't do that with
3
hosts, you must make sure no more than 2 pieces end up on a single host
ever,
so you can't just move from failure-domain=host to domain=osd, since that
would place all 6 pieces on the same host but different OSDs which would be
bad.

You need to make the crush rule pick two different OSDs per host, but not
more.
One way could be to make a tree where hosts has half of its OSDs in one
branch
and the other half in another (lets call it subhost in this example), then
you get 3*2
subhosts, and you make crush pick placement from subhosts and it will
always put
two pieces per OSD host, never on the same OSD and it will allow one host
to be
down for a while.

I would like to add that data is not very secure when you have no
redundancy at all
left. Machines will crash, they will require maintenance, patches, bios
updates and
things like that, and having NO redundancy while you have planned or
unplanned
downtime will be placing the data at huge risk, _any_ surprise in this
situation would
immediately lead to data loss.

Also, if one box dies, the cluster can't run and can't recover until you
have a new
host back in, so you are already running at the edge of data safety in your
normal case.
Even if this will "work", ceph as being a cluster really should have N+1
hosts or more
if your data split (replication factor or EC k+m) is equal to N.

-- 
May the most significant bit of your life be positive.
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx