It would really help to have a better understanding of your applications needs, IOPS versus bandwidth, etc. If for example your DB transactions are small, but plentiful (something like 2000 transactions per seconds) against a well defined and not too larger working set and all your other I/O needs are mostly reads from your web applications then a very different design comes to mind. - Adequately sized cache-tier or dedicated pool for DB usage based on NVMes (like 6x 800GB). - plain SATA HDDs with SSD journals (5:1 ratio, 4 SSDs, 20 HDDs). If going for a cache-tier, setting it to read-forward, so cache-miss reads go directly to the HDDs while time critical writes (and consequently reads of the same data) go to the NVMes. Christian On Thu, 6 Oct 2016 10:04:41 +0900 Christian Balzer wrote: > > Hello, > > On Wed, 05 Oct 2016 13:43:27 +0200 Denny Fuchs wrote: > > > hi, > > > > I get a call from Mellanox and we get now a offer for the following > > network: > > > > * 2 x SN2100 100Gb/s Switch 16 ports > Which incidentally is a half sized (identical HW really) Arctica 3200C. > > > * 10 x ConnectX 4LX-EN 25Gb card for hypervisor and OSD nodes > > * 4 x Adapter from Mellanox QSA to SFP+ port for interconnecting to our > > HP 2920 switches > > * 3 x Copper split cables 1 x 100Gb -> 4 x 25Gb > > > > You haven't commented on my rather lengthy mail about your whole design, > so to reiterate: > > The above will give you a beautiful, fast (but I doubt you'll need the > bandwidth for your DB transactions), low latency and redundant network > (these switches do/should support MC-LAG). > > But unless you make significant changes to the rest of your design, that's > akin to putting a formula 1 engine into a tractor, with the gear limiting > your top speed (the journal NVMe) and the wheels likely to fall off (your > consumer SSDs). > > In more technical terms, your network as depicted above can handle under > normal circumstances around 5GB/s, while your OSD nodes can't write more > than 1GB/s. > Massive, wasteful overkill. > > With a 2nd NVMe in there you'd be at 2GB/s, or simple overkill. > > With decent SSDs and in-line journals (400GB DC S3610s) you'd be at 4.8 > GB/s, a perfect match. > > Of course if your I/O bandwidth needs are actually below 1GB/s at all times > and all your care about is reducing latency, a single NVMe journal will be > fine (but also be a very obvious SPoF). > > Christian -- Christian Balzer Network/Systems Engineer chibi@xxxxxxx Global OnLine Japan/Rakuten Communications http://www.gol.com/ _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com