Re: Hardware recommendation / calculation for large cluster

Leen Besselink <leen@xxxxxxxxxxxxxxxxx> · Sun, 12 May 2013 22:33:06 +0200

On Sun, May 12, 2013 at 10:22:10PM +0200, Tim Mohlmann wrote:
> Hi,
> 
> On Sunday 12 May 2013 18:05:16 Leen Besselink wrote:
> 
> > 
> > I did see you mentioned you wanted to have, many disks in the same machine.
> > 
> > Not just machines with let's say 12 disks for example.
> > 
> > Did you know you need the CPU-power of a 1Ghz Xeon core per OSD for the
> > times when recovery is happening ?
> Nope, did not know it.
> 
> The current intent is to install 2x 2.4 Ghz xeon CPU, handeling 8 threads 
> each. So, 2*8*2.4=38.4 for max OSD's. It should be fine.
> 
> If I would go for the 72 disk option, I have to consider doubling that power. 
> The current max I can select from the dealer I am looking at, for the socket 
> housed in the supermicro 72x 3.5" version are 2x a Xeon x5680. Utilizing 12 
> threads each, at 3.33Ghz. So, 2*12*3.33=79.79 for max OSD's. Also this should 
> be fine.
> 
> What will happen if the CPU is maxed out anyway? Slowing things or crashing 
> things? In my opinion it is not a bad thing if a system is maxed out in such a 
> massive migration, which should not occur on a daily base. Sure, a disk that 
> fails every two weeks, no prob. What are we talking about? 0.3% of the 
> complete storage cluster. Even 0.15% if I would take the 72x3.5" servers.
> 

Even if one disk/OSD fails, it would need to recheck where each placement groups
should be stored and move stuff around if needed.

If during this action your CPUs are maxed out, you might start to lose connections
between OSDs and the process will need to start over.

At least that is how I understand it, I've done a few test installations, but
not yet deployed it in production.

The Inktank people said in the presentations I've seen (and looking at the picture
in the video from DreamHost I have a feeling that is what they've deployed):

12 HDD == 12 OSD per machine is ideal, maybe with 2 or 3 SSD for journaling if you
want more performance.

> If a complete server stops working, that is something else. But as I said in a 
> different split of this thread: if that happens I have got different things to 
> worry about, than a slow migration of data. As long as there is no data lost, 
> I don't really care it takes a bit longer.
> 
> Thanks for the advise.
> 
> Tim
> 
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com