Re: RAM recommendation with large OSDs?

Paul Emmerich <paul.emmerich@xxxxxxxx> · Wed, 2 Oct 2019 01:06:56 +0200

The problem with lots of OSDs per node is that this usually means you
have too few nodes. It's perfectly fine to run 60 OSDs per node if you
got a total of 1000 OSDs or so.
But I've seen too many setups with 3-5 nodes where each node runs 60
OSDs which makes no sense (and usually isn't even cheaper than more
nodes, especially once you consider the lost opportunity for running
erasure coding).

The usual backup cluster we are seeing is in the single-digit petabyte
range with about 12 to 24 disks per server running ~8+3 erasure
coding.

Paul

-- 
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90

On Wed, Oct 2, 2019 at 12:53 AM Darrell Enns <darrelle@xxxxxxxxxxxx> wrote:
>
> Thanks Paul. I was speaking more about total OSDs and RAM, rather than a single node. However, I am considering building a cluster with a large OSD/node count. This would be for archival use, with reduced performance and availability requirements. What issues would you anticipate with a large OSD/node count? Is the concern just the large rebalance if a node fails and takes out a large portion of the OSDs at once?
>
> -----Original Message-----
> From: Paul Emmerich <paul.emmerich@xxxxxxxx>
> Sent: Tuesday, October 01, 2019 3:00 PM
> To: Darrell Enns <darrelle@xxxxxxxxxxxx>
> Cc: ceph-users@xxxxxxx
> Subject: Re:  RAM recommendation with large OSDs?
>
> On Tue, Oct 1, 2019 at 6:12 PM Darrell Enns <darrelle@xxxxxxxxxxxx> wrote:
> >
> > The standard advice is “1GB RAM per 1TB of OSD”. Does this actually still hold with large OSDs on bluestore?
>
> No
>
> > Can it be reasonably reduced with tuning?
>
> Yes
>
>
> > From the docs, it looks like bluestore should target the “osd_memory_target” value by default. This is a fixed value (4GB by default), which does not depend on OSD size. So shouldn’t the advice really by “4GB per OSD”, rather than “1GB per TB”? Would it also be reasonable to reduce osd_memory_target for further RAM savings?
>
> Yes
>
> > For example, suppose we have 90 12TB OSD drives:
>
> Please don't put 90 drives in one node, that's not a good idea in 99.9% of the use cases.
>
> >
> > “1GB per TB” rule: 1080GB RAM
> > “4GB per OSD” rule: 360GB RAM
> > “2GB per OSD” (osd_memory_target reduced to 2GB): 180GB RAM
> >
> >
> >
> > Those are some massively different RAM values. Perhaps the old advice was for filestore? Or there is something to consider beyond the bluestore memory target? What about when using very dense nodes (for example, 60 12TB OSDs on a single node)?
>
> Keep in mind that it's only a target value, it will use more during recovery if you set a low value.
> We usually set a target of 3 GB per OSD and recommend 4 GB of RAM per OSD.
>
> RAM saving trick: use fewer PGs than recommended.
>
>
> Paul
>
>
>
> --
> Paul Emmerich
>
> Looking for help with your Ceph cluster? Contact us at https://croit.io
>
> croit GmbH
> Freseniusstr. 31h
> 81247 München
> www.croit.io
> Tel: +49 89 1896585 90
>
> > _______________________________________________
> > ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an
> > email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx