Re: OSD node memory sizing

Christian Balzer <chibi@xxxxxxx> · Thu, 19 May 2016 10:36:50 +0900

Hello again,

On Wed, 18 May 2016 15:32:50 +0200 Dietmar Rieder wrote:

> Hello Christian,
> 
> > Hello,
> > 
> > On Wed, 18 May 2016 13:57:59 +0200 Dietmar Rieder wrote:
> > 
> >> Dear Ceph users,
> >>
> >> I've a question regarding the memory recommendations for an OSD node.
> >>
> >> The official Ceph hardware recommendations say that an OSD node should
> >> have 1GB Ram / TB OSD [1]
> >>
> >> The "Reference Architecture" whitpaper from Red Hat & Supermicro says
> >> that "typically" 2GB of memory per OSD on a OSD node is used. [2]
> >>
> > This question has been asked and answered here countless times.
> > 
> > Maybe something a bit more detailed ought to be placed in the first
> > location, or simply a reference to the 2nd one. 
> > But then again, that would detract from the RH added value.
> 
> thanks for replying, nonetheless.
> I checked the list before but I failed to find a definitive answer, may
> be I was not looking hard enough. Anyway, thanks!
> 
They tend to hidden sometimes in other threads, but there really is a lot..

> >  
> >> According to the recommendation in [1] an OSD node with 24x 8TB OSD
> >> disks is "underpowered "  when it is equipped with 128GB of RAM.
> >> However, following the "recommendation" in [2] 128GB should be plenty
> >> enough.
> >>
> > It's fine per se, the OSD processes will not consume all of that even
> > in extreme situations.
> 
> Ok, if I understood this correctly, then 128GB should be enough also
> during rebalancing or backfilling.
> 
Definitely, but realize that during this time of high memory consumption
cause by backfilling your system is also under strain from objects moving
in an out, so as per the high-density thread you will want all your dentry
and other important SLAB objects to stay in RAM.

That's a lot of objects potentially with 8TB, so when choosing DIMMs pick
ones that leave you with the option to go to 256GB later if need be.

Also you'll probably have loads of fun playing with CRUSH weights to keep
the utilization of these 8TB OSDs within 100GB of each other. 

> > 
> > Very large OSDs and high density storage nodes have other issues and
> > challenges, tuning and memory wise.
> > There are several threads about these recently, including today.
> 
> Thanks, I'll study these...
> 
> >> I'm wondering which of the two is good enough for a Ceph cluster with
> >> 10 nodes using EC (6+3)
> >>
> > I would spend more time pondering about the CPU power of these machines
> > (EC need more) and what cache tier to get.
> 
> We are planing to equip the OSD nodes with 2x2650v4 CPUs (24 cores @
> 2.2GHz), that is 1 core/OSD. For the cache tier each OSD node gets two
> 800Gb NVMe's. We hope this setup will give reasonable performance with
> EC.
> 
So you have actually 26 OSDs per node then.
I'd say the CPUs are fine, but EC and the NVMes will eat a fair share of
it.
That's why I prefer to have dedicated cache tier nodes with fewer but
faster cores, unless the cluster is going to be very large.
With Hammer a 800GB DC S3160 SSD based OSD can easily saturate a 
"E5-2623 v3" core @3.3GHz (nearly 2 cores to be precise) and Jewel has
optimization that will both make it faster by itself AND enable it to
use more CPU resources as well.

The NVMes (DC P3700 one presumes?) just for cache tiering, no SSD
journals for the OSDs?
What are your network plans then, as in is your node storage bandwidth a
good match for your network bandwidth? 

> > That is, if performance is a requirement in your use case.
> 
> Always, who wouldn't care about performance?  :-)
> 
"Good enough" sometimes really is good enough.

Since you're going for 8TB OSDs, EC and 10 nodes it feels that for you
space is important, so something like archival, not RBD images for high
performance VMs.

What is your use case?

Christian
-- 
Christian Balzer        Network/Systems Engineer                
chibi@xxxxxxx   	Global OnLine Japan/Rakuten Communications
http://www.gol.com/
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com