Re: A simple erasure-coding question about redundance

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

1. two disks would fail where both failed disks are not on the same host? I think ceph would be able to find a PG distributed across all hosts avoiding the two failed disks, so ceph would be able to repair and reach a healthy status after a while?

yes, if there is enough disk space and no other OSDs fail during that time then ceph would recover successfully and the PGs would still be available.


2. Two complete hosts would fail say because of broken power supplies. In this case ceph would no longer be able to repair the damage because there are no two more "free" remaining hosts to satisfy the 4+2 rule (with redundancy on host level). So data would not be lost but the cluster might stop delivering data and would be unable to repair and thus would also be unable to become healthy again?

Correct, your cluster would be in a degraded state until you have 6 hosts again. But keep in mind that with EC your pool's min_size is usually k+1 so in your example your cluster would stop serving I/O the moment the second host fails. The best choice would be if k+m would be smaller than the number of available hosts so your cluster can recover. If you want to be able to recover from two failed hosts you should respectively take that into consideration when choosing k and m.

Regards,
Eugen


Zitat von Rainer Krienke <krienke@xxxxxxxxxxxxxx>:

Hello,

recently I thought about erasure coding and how to set k+m in a useful way also taking into account the number of hosts available for ceph. Say I would have this setup:

The cluster has 6 hosts and I want to allow two *hosts* to fail without loosing data. So I might choose k+m as 4+2 with redundancy at host level, but isn't this a little unwise?

What would happen if:

1. two disks would fail where both failed disks are not on the same host? I think ceph would be able to find a PG distributed across all hosts avoiding the two failed disks, so ceph would be able to repair and reach a healthy status after a while?

2. Two complete hosts would fail say because of broken power supplies. In this case ceph would no longer be able to repair the damage because there are no two more "free" remaining hosts to satisfy the 4+2 rule (with redundancy on host level). So data would not be lost but the cluster might stop delivering data and would be unable to repair and thus would also be unable to become healthy again?

Right or wrong?

Thanks a lot
Rainer
--
Rainer Krienke, Uni Koblenz, Rechenzentrum, A22, Universitaetsstrasse  1
56070 Koblenz, Web: http://www.uni-koblenz.de/~krienke, Tel: +49261287 1312
PGP: http://www.uni-koblenz.de/~krienke/mypgp.html, Fax: +49261287 1001312
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux