Re: Large numbers of OSD per node

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 11/06/2012 12:31 PM, Gandalf Corvotempesta wrote:
2012/11/6 Stefan Kleijkers <stefan@xxxxxxxxxxxxxxxxxxxx>:
Well you have to keep in mind that when a node fails the PG's that resided
on that node have to be redistributed over all the other nodes. So you begin
moving about 1% of the data between all the remaining nodes/osds (coming
from an OSD that has the remaining replica of the pg to the new OSD that
will get a replica). So you move from and to all the remaining osd's and
that will give you a lot of bandwidth and therefor fast recorvery to a
consistent state.
Ok, but in this case, 1% is still 36TB of data.
There are no difference between 3 nodes with 36TB of data each or 90
nodes with 36TB of data each.
In case of a node failure, you always have to move 36TB of data, no
matter on how many nodes do you have.

True, but it's a huge difference if you have to redistribute the 36T between 2 remaining nodes or between 89 remaining nodes. And with such a few nodes you hit probably a couple of other bottlenecks like CPU power per node, networking bandwidth per node, etc... I have noticed this the hard way with 3 nodes and 24 disks/osds per node.

Stefan
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux