Hi,
1. two disks would fail where both failed disks are not on the same
host? I think ceph would be able to find a PG distributed across all
hosts avoiding the two failed disks, so ceph would be able to repair
and reach a healthy status after a while?
yes, if there is enough disk space and no other OSDs fail during that
time then ceph would recover successfully and the PGs would still be
available.
2. Two complete hosts would fail say because of broken power
supplies. In this case ceph would no longer be able to repair the
damage because there are no two more "free" remaining hosts to
satisfy the 4+2 rule (with redundancy on host level). So data would
not be lost but the cluster might stop delivering data and would be
unable to repair and thus would also be unable to become healthy
again?
Correct, your cluster would be in a degraded state until you have 6
hosts again. But keep in mind that with EC your pool's min_size is
usually k+1 so in your example your cluster would stop serving I/O the
moment the second host fails.
The best choice would be if k+m would be smaller than the number of
available hosts so your cluster can recover. If you want to be able to
recover from two failed hosts you should respectively take that into
consideration when choosing k and m.
Regards,
Eugen
Zitat von Rainer Krienke <krienke@xxxxxxxxxxxxxx>:
Hello,
recently I thought about erasure coding and how to set k+m in a
useful way also taking into account the number of hosts available
for ceph. Say I would have this setup:
The cluster has 6 hosts and I want to allow two *hosts* to fail
without loosing data. So I might choose k+m as 4+2 with redundancy
at host level, but isn't this a little unwise?
What would happen if:
1. two disks would fail where both failed disks are not on the same
host? I think ceph would be able to find a PG distributed across all
hosts avoiding the two failed disks, so ceph would be able to repair
and reach a healthy status after a while?
2. Two complete hosts would fail say because of broken power
supplies. In this case ceph would no longer be able to repair the
damage because there are no two more "free" remaining hosts to
satisfy the 4+2 rule (with redundancy on host level). So data would
not be lost but the cluster might stop delivering data and would be
unable to repair and thus would also be unable to become healthy
again?
Right or wrong?
Thanks a lot
Rainer
--
Rainer Krienke, Uni Koblenz, Rechenzentrum, A22, Universitaetsstrasse 1
56070 Koblenz, Web: http://www.uni-koblenz.de/~krienke, Tel: +49261287 1312
PGP: http://www.uni-koblenz.de/~krienke/mypgp.html, Fax:
+49261287 1001312
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx