Hi Matteo, > -----Original Message----- > From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf Of Matteo Dacrema > Sent: 11 November 2016 10:57 > To: Christian Balzer <chibi@xxxxxxx> > Cc: ceph-users@xxxxxxxxxxxxxx > Subject: Re: 6 Node cluster with 24 SSD per node: Hardwareplanning/ agreement > > Hi, > > after your tips and consideration I’ve planned to use this hardware configuration: > > - 4x OSD ( for starting the project): > • 1x Intel E5-1630v4 @ 4.00 Ghz with turbo 4 core, 8 thread , 10MB cache > • 128GB RAM ( does frequency matter in terms of performance ? ) > • 4x Intel P3700 2TB NVME > • 2x Mellanox Connect-X 3 Pro 40gbit/s I would maybe try and look at the higher core count 1600's, they might give you a bit more total performance as you will need it for NVME. I recently did some testing on how much Mhz each Ceph IO roughly needs. http://www.sys-pro.co.uk/how-many-mhz-does-a-ceph-io-need/ The figure will probably vary significantly depending on several factors, but might be handy for a rough guide. You might also want to see if you can get your hands on any of them 25/50/100GB networking stuff. They are clocked a lot faster than the 10/40GB products so will likely help with latency. A 128GB ram may also be an overkill, although extra ram is always nice, for 8TB of storage, 16GB is probably a sufficient amount. > > - 3 x MON: > • 1x Intel E5-1630v4 > • 64GB RAM > • 2 x Intel S3510 SSD > • 2x Mellanox Connect-X 3 Pro 10gbit/s This looks fine for the Mons. > > What do you think about? > I don’t know if this CPU works well with ceph workload and if it’s better to use 4x Samsung SM863 1.92TB rather than Intel P3700. > I’ve considered to place the Journal inline. > > Thanks > Matteo > > Il giorno 11 ott 2016, alle ore 03:04, Christian Balzer <mailto:chibi@xxxxxxx> ha scritto: > > > Hello, > > On Mon, 10 Oct 2016 14:56:40 +0200 Matteo Dacrema wrote: > > > Hi, > > I’m planning a similar cluster. > Because it’s a new project I’ll start with only 2 node cluster witch each: > As Wido said, that's a very dense and risky proposition for a first time > cluster. > Never mind the lack of 3rd node for 3 MONs is begging for Murphy to come > and smite you. > > While I understand the need/wish to save money and space by maximizing > density, that only works sort of when you have plenty of such nodes to > begin with. > > Your proposed setup isn't cheap to begin with, consider alternatives like > the one I'm pointing out below. > > > 2x E5-2640v4 with 40 threads total @ 3.40Ghz with turbo > Spendy and still potentially overwhelmed when dealing with small write > IOPS. > > > 24x 1.92 TB Samsung SM863 > Should be fine, but keep in mind that with inline journals they will only > have about a 1.5 DWPD endurance. > At about 5.7GB/s write bandwidth not a total mismatch to your 4GB/s > network link (unless those 2 ports are MC-LAG, giving you 8GB/s). > > > 128GB RAM > 3x LSI 3008 in IT mode / HBA for OSD - 1 each 8 OSD/SDDs > Also not free, they need to be on the latest FW and kernel version to work > reliably with SSDs. > > > 2x SSD for OS > 2x 40Gbit/s NIC > > Consider basing your cluster on two of these 2U 4node servers: > https://www.supermicro.com.tw/products/system/2U/2028/SYS-2028TP-HTTR.cfm > > Built-in dual 10Gb/s, the onboard SATA works nicely with SSDs, you can get > better matched CPU(s). > > 10Gb/s MC-LAG (white box) switches are also widely available and > affordable. > > So 8 nodes instead of 2, in the same space. > > Of course running a cluster (even with well monitored and reliable SSDs) > with a replication of 2 has risks (and that risk increases with the size of > the SSDs), so you may want to reconsider that. > > Christian > > > What about this hardware configuration? Is that wrong or I’m missing something ? > > Regards > Matteo > > > Il giorno 06 ott 2016, alle ore 13:52, Denny Fuchs <mailto:linuxmail@xxxxxxxx> ha scritto: > > God morning, > > > * 2 x SN2100 100Gb/s Switch 16 ports > Which incidentally is a half sized (identical HW really) Arctica 3200C. > > really never heart from them :-) (and didn't find any price €/$ region) > > > > * 10 x ConnectX 4LX-EN 25Gb card for hypervisor and OSD nodes > [...] > > > You haven't commented on my rather lengthy mail about your whole design, > so to reiterate: > > maybe accidentally skipped, so much new input :-) sorry > > > The above will give you a beautiful, fast (but I doubt you'll need the > bandwidth for your DB transactions), low latency and redundant network > (these switches do/should support MC-LAG). > > Jepp, they do MLAG (with the 25Gbit version of the cx4 NICs) > > > In more technical terms, your network as depicted above can handle under > normal circumstances around 5GB/s, while your OSD nodes can't write more > than 1GB/s. > Massive, wasteful overkill. > > before we started with planing Ceph / new hypervisor design, we where sure that our network would be more powerful, than we > need in the near future. Our applications / DB never used the full 1GBs in any way ... we loosing speed on the plain (painful LANCOM) > switches and the applications (mostly Perl written in the beginning of the 2005). > But anyway, the network should be have enough capacity for the next years, because it is much more complicated to change network > (design) components, than to kick a node. > > > With a 2nd NVMe in there you'd be at 2GB/s, or simple overkill. > > We would buy them ... so that in the end, every 12 disk has a separated NVMe > > > With decent SSDs and in-line journals (400GB DC S3610s) you'd be at 4.8 > GB/s, a perfect match. > > What about the worst case, two nodes are broken, fixed and replaced ? I red (a lot) that some Ceph users had massive problems, > while the rebuild runs. > > > > Of course if your I/O bandwidth needs are actually below 1GB/s at all times > and all your care about is reducing latency, a single NVMe journal will be > fine (but also be a very obvious SPoF). > > Very happy to put the finger in the wound, SPof ... is a very hard thing ... so we try to plan everything redundant :-) > > The bad side of life: the SSD itself. A consumer SSD lays round about 70/80€, a DC SSD jumps up to 120-170€. My nightmare is: a lot of > SSDs are jumping over the bridge at the same time .... -> arghh > > But, we are working on it :-) > > I've searching an alternative for the Asus board with more PCIe slots and maybe some components; better CPU with 3.5Ghz-> ; > maybe a mix from the SSDs ... > > At this time, I've found the X10DRi: > > https://www.supermicro.com/products/motherboard/xeon/c600/x10dri.cfm<https://www.supermicro.com/products/motherboard > /xeon/c600/x10dri.cfm> > > and I think we use the E5-2637v4 :-) > > cu denny > > > -- > Questo messaggio e' stato analizzato con Libra ESVA ed e' risultato non infetto. > Clicca qui per segnalarlo come spam. <http://mx01.enter.it/cgi-bin/learn-msg.cgi?id=0E9124029A.A17D5> > Clicca qui per metterlo in blacklist <http://mx01.enter.it/cgi-bin/learn- > msg.cgi?blacklist=1&id=0E9124029A.A17D5>_______________________________________________ > ceph-users mailing list > mailto:ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > -- > Christian Balzer Network/Systems Engineer > mailto:chibi@xxxxxxx Global OnLine Japan/Rakuten Communications > http://www.gol.com/ > > -- > Questo messaggio e' stato analizzato con Libra ESVA ed e' risultato non infetto. > Seguire il link qui sotto per segnalarlo come spam: > http://mx01.enter.it/cgi-bin/learn-msg.cgi?id=ABCA540C26.A6C6E _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com