Re: 6 Node cluster with 24 SSD per node: Hardwareplanning/ agreement

Christian Balzer <chibi@xxxxxxx> · Thu, 6 Oct 2016 10:28:22 +0900

It would really help to have a better understanding of your applications
needs, IOPS versus bandwidth, etc.

If for example your DB transactions are small, but plentiful (something
like 2000 transactions per seconds) against a well defined and not too
larger working set and all your other I/O needs are mostly reads from your
web applications then a very different design comes to mind.

- Adequately sized cache-tier or dedicated pool for DB usage based on NVMes
  (like 6x 800GB).
- plain SATA HDDs with SSD journals (5:1 ratio, 4 SSDs, 20 HDDs). 

If going for a cache-tier, setting it to read-forward, so cache-miss reads
go directly to the HDDs while time critical writes (and consequently reads
of the same data) go to the NVMes. 

Christian

On Thu, 6 Oct 2016 10:04:41 +0900 Christian Balzer wrote:

> 
> Hello,
> 
> On Wed, 05 Oct 2016 13:43:27 +0200 Denny Fuchs wrote:
> 
> > hi,
> > 
> > I get a call from Mellanox and we get now a offer for the following 
> > network:
> > 
> > * 2 x SN2100 100Gb/s Switch 16 ports
> Which incidentally is a half sized (identical HW really) Arctica 3200C.
> 
> > * 10 x ConnectX 4LX-EN 25Gb card for hypervisor and OSD nodes
> > * 4 x Adapter from Mellanox QSA to SFP+ port for interconnecting to our 
> > HP 2920 switches
> > * 3 x Copper split cables 1 x 100Gb -> 4 x 25Gb
> > 
> 
> You haven't commented on my rather lengthy mail about your whole design,
> so to reiterate:
> 
> The above will give you a beautiful, fast (but I doubt you'll need the
> bandwidth for your DB transactions), low latency and redundant network
> (these switches do/should support MC-LAG). 
> 
> But unless you make significant changes to the rest of your design, that's
> akin to putting a formula 1 engine into a tractor, with the gear limiting
> your top speed (the journal NVMe) and the wheels likely to fall off (your
> consumer SSDs).
> 
> In more technical terms, your network as depicted above can handle under
> normal circumstances around 5GB/s, while your OSD nodes can't write more
> than 1GB/s.
> Massive, wasteful overkill.
> 
> With a 2nd NVMe in there you'd be at 2GB/s, or simple overkill.
> 
> With decent SSDs and in-line journals (400GB DC S3610s) you'd be at 4.8
> GB/s, a perfect match.
> 
> Of course if your I/O bandwidth needs are actually below 1GB/s at all times
> and all your care about is reducing latency, a single NVMe journal will be
> fine (but also be a very obvious SPoF).
> 
> Christian

-- 
Christian Balzer        Network/Systems Engineer                
chibi@xxxxxxx   	Global OnLine Japan/Rakuten Communications
http://www.gol.com/
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com