Re: Large numbers of OSD per node

Stefan Kleijkers <stefan@xxxxxxxxxxxxxxxxxxxx> · Tue, 06 Nov 2012 12:05:14 +0100

On 11/06/2012 11:24 AM, Gandalf Corvotempesta wrote:
2012/11/6 Wido den Hollander <wido@xxxxxxxxx>:
The setup described on that page has 90 nodes, so one node failing is a
little over 1% of the cluster which fails.
I think i'm missing something.
In case of a failure, they will always have to resync 36 TB of data,
no matter if they have 90 servers.
Each server is 36TB, so every times they  need to resync the whole server.

Well you have to keep in mind that when a node fails the PG's that 
resided on that node have to be redistributed over all the other nodes. 
So you begin moving about 1% of the data between all the remaining 
nodes/osds (coming from an OSD that has the remaining replica of the pg 
to the new OSD that will get a replica). So you move from and to all the 
remaining osd's and that will give you a lot of bandwidth and therefor 
fast recorvery to a consistent state.

Stefan
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html