Re: New Ceph cluster design

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello,

On Sat, 10 Mar 2018 16:14:53 +0100 Vincent Godin wrote:

> Hi,
> 
> As i understand it, you'll have one RAID1 of two SSDs for 12 HDDs. A
> WAL is used for all writes on your host. 

This isn't filestore, AFAIK the WAL/DB will be used for small writes only
to keep latency with Bluestore akin to filestore levels.
Large writes will go directly to the HDDs.

However each write will of course necessitate a write to the DB and thus
IOPS (much more so than bandwidth) are paramount here.

> If you have good SSDs, they
> can handle 450-550 MBpsc. Your 12 HDDs SATA can handle 12 x 100 MBps
> that is to say 1200 GBps. 

Aside from what I wrote above I'd like to repeat myself and others here
for the umpteenth time, focusing on bandwidth is a fallacy in nearly all
use cases, IOPS tend to become the bottleneck.

Also that's 1.2GB/s or 1200MB/s. 

The OP stated 10TB HDDs and many (but not exclusively?) small objects,
so if we're looking at lots of small writes the bandwidth of the SSDs
becomes a factor again and with the sizes involved they appear too small
as well. (going with the rough ratio of 10GB per TB).

Either a RAID1 of at least 1600GB NVMes or 2 800GB NVMes and a resulting
failure domain of 6 HDDs would be better/safer fit. 

> So your RAID 1 will be the bootleneck with
> this design. A good design would be to have one SSD for 4 or 5 HDD. In
> your case, the best option would be to start with 3 SSDs for 12 HDDs
> to have a balances node. Don't forget to choose SSD with a high WDPD
> ratio (>10)
> 
More SSDs/NVMes are of course better and DWPD is important, but probably
less so than with filestore journals.
A DWPD of >10 is overkill for anything I've ever encountered, for many
things 3 will be fine, especially if one knows what is expected.

For example a filestore cache tier SSD with inline journal (800GB DC S3610,
3 DWPD) has a media wearout of 97 (3% used) after 2 years with this
constant and not insignificant load:
---
Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sdb               0.03    83.09    7.07  303.24   746.64  5084.99    37.59     0.05    0.15    0.71    0.13   0.06   2.00
---

300 write IOPS and 5MB/s for all that time.

Christian
-- 
Christian Balzer        Network/Systems Engineer                
chibi@xxxxxxx   	Rakuten Communications
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux