On 11/06/2012 11:24 AM, Gandalf Corvotempesta wrote:
2012/11/6 Wido den Hollander <wido@xxxxxxxxx>:
The setup described on that page has 90 nodes, so one node failing is a
little over 1% of the cluster which fails.
I think i'm missing something.
In case of a failure, they will always have to resync 36 TB of data,
no matter if they have 90 servers.
Each server is 36TB, so every times they need to resync the whole server.
Well you have to keep in mind that when a node fails the PG's that
resided on that node have to be redistributed over all the other nodes.
So you begin moving about 1% of the data between all the remaining
nodes/osds (coming from an OSD that has the remaining replica of the pg
to the new OSD that will get a replica). So you move from and to all the
remaining osd's and that will give you a lot of bandwidth and therefor
fast recorvery to a consistent state.
Stefan
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html