Re: luminous/bluetsore osd memory requirements

David Turner <drakonstein@xxxxxxxxx> · Sat, 12 Aug 2017 20:19:36 +0000

Did you do any of that testing to involve a degraded cluster, backfilling, peering, etc? A healthy cluster running normally uses sometimes 4x less memory and CPU resources as a cluster consistently peering and degraded.

On Sat, Aug 12, 2017, 2:40 PM Nick Fisk <nick@xxxxxxxxxx> wrote:
I was under the impression the memory requirements for Bluestore would be

around 2-3GB per OSD regardless of capacity.

CPU wise, I would lean towards working out how much total Ghz you require

and then get whatever CPU you need to get there, but with a preference of

Ghz over cores. Yes, there will be a slight overhead to having more threads

running on a lower number of cores, but I believe this is fairly minimal in

comparison to the speed boost obtained by the single threaded portion of the

data path in each OSD from running on a faster Ghz core. Each PG takes a

lock for each operation and so any other requests for the same PG will queue

up and be processed sequentially. The faster you can process through this

stage the better. I'm pretty sure if you graphed PG activity on an average

cluster, you would see a high skew to a certain number of PG's being hit

more often than others. I think Mark N has been experiencing the effects of

the PG locking in recent tests.

Also don't forget to make sure your CPUs are running at c-state C1 and max

Freq. This can sometimes give up to a 4x reduction in latency.

Also, if you look at the number of threads running on a OSD node, it will be

in the 10's of 100's of threads, each OSD process itself has several

threads. So don't think that 12 OSD's=12 core processor.

I did some tests to measure cpu usage per IO, which you may find useful.

http://www.sys-pro.co.uk/how-many-mhz-does-a-ceph-io-need/

I can max out 12x7.2k disks on a E3 1240 CPU and its only running at about

15-20%.

I haven't done any proper Bluestore tests, but from some rough testing the

CPU usage wasn't too dissimilar from Filestore.

Depending on if you are running hdd's or ssd's and how many per node. I

would possibly look at the single socket E3's or E5's.

Although saying that, the recent AMD and Intel announcements also have some

potentially interesting single socket Ceph potentials in the mix.

Hope that helps.

Nick

> -----Original Message-----

> From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf

> Of Stijn De Weirdt

> Sent: 12 August 2017 14:41

> To: David Turner <drakonstein@xxxxxxxxx>; ceph-users@xxxxxxxxxxxxxx

> Subject: Re:  luminous/bluetsore osd memory requirements

>

> hi david,

>

> sure i understand that. but how bad does it get when you oversubscribe

> OSDs? if context switching itself is dominant, then using HT should

> allow to run double the amount of OSDs on same CPU (on OSD per HT

> core); but if the issue is actual cpu cycles, HT won't help that much

> either (1 OSD per HT core vs 2 OSD per phys core).

>

> i guess the reason for this is that OSD processes have lots of threads?

>

> maybe i can run some tests on a ceph test cluster myself ;)

>

> stijn

>

>

> On 08/12/2017 03:13 PM, David Turner wrote:

> > The reason for an entire core peer osd is that it's trying to avoid

> > context switching your CPU to death. If you have a quad-core

> > processor with HT, I wouldn't recommend more than 8 osds on the box.

> > I probably would go with 7 myself to keep one core available for

> > system operations. This recommendation has nothing to do with GHz.

> > Higher GHz per core will likely improve your cluster latency. Of

> > course if your use case says that you only need very minimal

> > through-put... There is no need to hit or exceed the recommendation.

> > The number of cores recommendation is not changing for bluestore. It

> > might add a recommendation of how fast your processor should be...

> > But making it based on how much GHz per TB is an invitation to context

switch to death.

> >

> > On Sat, Aug 12, 2017, 8:40 AM Stijn De Weirdt

> > <stijn.deweirdt@xxxxxxxx>

> > wrote:

> >

> >> hi all,

> >>

> >> thanks for all the feedback. it's clear we should stick to the

> >> 1GB/TB for the memory.

> >>

> >> any (changes to) recommendation for the CPU? in particular, is it

> >> still the rather vague "1 HT core per OSD" (or was it "1 1Ghz HT

> >> core per OSD"? it would be nice if we had some numbers like

> >> required specint per TB and/or per Gbs. also any indication how

> >> much more cpu EC uses (10%, 100%, ...)?

> >>

> >> i'm aware that this also depeneds on the use case, but i'll take

> >> any pointers i can get. we will probably end up overprovisioning,

> >> but it would be nice if we can avoid a whole cpu (32GB dimms are

> >> cheap, so lots of ram with single socket is really possible these

days).

> >>

> >> stijn

> >>

> >> On 08/10/2017 05:30 PM, Gregory Farnum wrote:

> >>> This has been discussed a lot in the performance meetings so I've

> >>> added Mark to discuss. My naive recollection is that the

> >>> per-terabyte recommendation will be more realistic  than it was in

> >>> the past (an effective increase in memory needs), but also that it

> >>> will be under much better control than previously.

> >>>

> >>> On Thu, Aug 10, 2017 at 1:35 AM Stijn De Weirdt

> >>> <stijn.deweirdt@xxxxxxxx

> >>>

> >>> wrote:

> >>>

> >>>> hi all,

> >>>>

> >>>> we are planning to purchse new OSD hardware, and we are wondering

> >>>> if for upcoming luminous with bluestore OSDs, anything wrt the

> >>>> hardware recommendations from

> >>>> http://docs.ceph.com/docs/master/start/hardware-recommendations/

> >>>> will be different, esp the memory/cpu part. i understand from

> >>>> colleagues that the async messenger makes a big difference in

> >>>> memory usage (maybe also cpu load?); but we are also interested

> >>>> in

> the "1GB of RAM per TB"

> >>>> recommendation/requirement.

> >>>>

> >>>> many thanks,

> >>>>

> >>>> stijn

> >>>> _______________________________________________

> >>>> ceph-users mailing list

> >>>> ceph-users@xxxxxxxxxxxxxx

> >>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

> >>>>

> >>>

> >> _______________________________________________

> >> ceph-users mailing list

> >> ceph-users@xxxxxxxxxxxxxx

> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

> >>

> >

> _______________________________________________

> ceph-users mailing list

> ceph-users@xxxxxxxxxxxxxx

> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________

ceph-users mailing list

ceph-users@xxxxxxxxxxxxxx

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com