Re: pros/cons of multiple OSD's per host

Christian Balzer <chibi@xxxxxxx> · Tue, 22 Aug 2017 12:38:48 +0900

On Tue, 22 Aug 2017 09:32:20 +0800 Nick Tan wrote:

> On Mon, Aug 21, 2017 at 8:57 PM, David Turner <drakonstein@xxxxxxxxx> wrote:
> 
> > It is not recommended to get your cluster more than 70% full due to
> > rebalancing and various other reasons. That would change your 12x 10TB
> > disks in a host to only be 84TB if you filled your cluster to 70% full. I
> > still think that the most important aspects of what is best for you hasn't
> > been provided as none of us know what type of CephFS usage you are planning
> > on.  Are you writing once and reading forever? Using this for home
> > directories? doing processing of files in it? etc...  Each instance is
> > different and would have different hardware and configuration requirements.
> >
> >
> >  
> Hi David,
> 
> The planned usage for this CephFS cluster is scratch space for an image
> processing cluster with 100+ processing nodes.

Lots of clients, how much data movement would you expect, how many images
come in per timeframe, lets say an hour?
Typical size of a image?

Does an image come in and then gets processed by one processing node?
Unlikely to be touched again, at least in the short term?
Probably being deleted after being processed?

>  My thinking is we'd be
> better off with a large number (100+) of storage hosts with 1-2 OSD's each,
> rather than 10 or so storage nodes with 10+ OSD's to get better parallelism
> but I don't have any practical experience with CephFS to really judge.  
CephFS is one thing (of which I have very limited experience), but at this
point you're talking about parallelism in Ceph (RBD).
And that happens much more on an OSD than host level.

Which you _can_ achieve with larger nodes, if they're well designed.
Meaning CPU/RAM/interal storage bandwidth/network bandwidth being in
"harmony". 

Also you keep talking about really huge HDDs, you could do worse than
halving their size and doubling their numbers to achieve much more
bandwidth and the ever crucial IOPS (even in your use case).

So something like 20x 12 HDD servers, with SSDs/NVMes for journal/bluestore
wAL/DB if you can afford or actually need it.

CephFS metadata on a SSD pool isn't the most dramatic improvement one can
do (or so people tell me), but given your budget it may be worthwhile.

> And
> I don't have enough hardware to setup a test cluster of any significant
> size to run some actual testing.
> 
You may want to set up something to get a feeling for CephFS, if it's
right for you or if something else on top of RBD may be more suitable.

Christian
-- 
Christian Balzer        Network/Systems Engineer                
chibi@xxxxxxx   	Rakuten Communications
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com