Den fre 27 aug. 2021 kl 12:43 skrev Rainer Krienke <krienke@xxxxxxxxxxxxxx>: > > Hello, > > recently I thought about erasure coding and how to set k+m in a useful > way also taking into account the number of hosts available for ceph. Say > I would have this setup: > > The cluster has 6 hosts and I want to allow two *hosts* to fail without > loosing data. So I might choose k+m as 4+2 with redundancy at host > level, but isn't this a little unwise? Yes. You should have more hosts for EC 4+2, or .. less K. > What would happen if: > 1. two disks would fail where both failed disks are not on the same > host? I think ceph would be able to find a PG distributed across all > hosts avoiding the two failed disks, so ceph would be able to repair and > reach a healthy status after a while? Yes, if other disks are available to spill over to. > 2. Two complete hosts would fail say because of broken power supplies. > In this case ceph would no longer be able to repair the damage because > there are no two more "free" remaining hosts to satisfy the 4+2 rule > (with redundancy on host level). So data would not be lost but the > cluster might stop delivering data and would be unable to repair and > thus would also be unable to become healthy again? > > Right or wrong? In the second case, the cluster stops until at least one new host appears, and only then can it start repairing, and after some repairs have made at least one more shard for your EC objects will it start serving data again. Also, it will be very dangerous until this happens, in case a third drive or host fails. -- May the most significant bit of your life be positive. _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx