Hi Mark,
here is a first collection of heap profiling data (valid 30 days):
https://files.dtu.dk/u/53HHic_xx5P1cceJ/heap_profiling-2020-08-03.tgz?l
This was collected with the following config settings:
osd dev osd_memory_cache_min 805306368
osd basic osd_memory_target 2147483648
Setting the cache_min value seems to help keeping cache space available. Unfortunately, the above collection is for 12 days only. I needed to restart the OSD and will need to restart it soon again. I hope I can then run a longer sample. The profiling does cause slow ops though.
Maybe you can see something already? It seems to have collected some leaked memory. Unfortunately, it was a period of extremely low load. Basically, with the day of recording the utilization dropped to almost zero.
Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
________________________________________
From: Frank Schilder <frans@xxxxxx>
Sent: 21 July 2020 12:57:32
To: Mark Nelson; Dan van der Ster
Cc: ceph-users
Subject: Re: OSD memory leak?
Quick question: Is there a way to change the frequency of heap dumps? On this page http://goog-perftools.sourceforge.net/doc/heap_profiler.html a function HeapProfilerSetAllocationInterval() is mentioned, but no other way of configuring this. Is there a config parameter or a ceph daemon call to adjust this?
If not, can I change the dump path?
Its likely to overrun my log partition quickly if I cannot adjust either of the two.
Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
________________________________________
From: Frank Schilder <frans@xxxxxx>
Sent: 20 July 2020 15:19:05
To: Mark Nelson; Dan van der Ster
Cc: ceph-users
Subject: Re: OSD memory leak?
Dear Mark,
thank you very much for the very helpful answers. I will raise osd_memory_cache_min, leave everything else alone and watch what happens. I will report back here.
Thanks also for raising this as an issue.
Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
________________________________________
From: Mark Nelson <mnelson@xxxxxxxxxx>
Sent: 20 July 2020 15:08:11
To: Frank Schilder; Dan van der Ster
Cc: ceph-users
Subject: Re: Re: OSD memory leak?
On 7/20/20 3:23 AM, Frank Schilder wrote:
Dear Mark and Dan,
I'm in the process of restarting all OSDs and could use some quick advice on bluestore cache settings. My plan is to set higher minimum values and deal with accumulated excess usage via regular restarts. Looking at the documentation (https://docs.ceph.com/docs/mimic/rados/configuration/bluestore-config-ref/), I find the following relevant options (with defaults):
# Automatic Cache Sizing
osd_memory_target {4294967296} # 4GB
osd_memory_base {805306368} # 768MB
osd_memory_cache_min {134217728} # 128MB
# Manual Cache Sizing
bluestore_cache_meta_ratio {.4} # 40% ?
bluestore_cache_kv_ratio {.4} # 40% ?
bluestore_cache_kv_max {512 * 1024*1024} # 512MB
Q1) If I increase osd_memory_cache_min, should I also increase osd_memory_base by the same or some other amount?
osd_memory_base is a hint at how much memory the OSD could consume
outside the cache once it's reached steady state. It basically sets a
hard cap on how much memory the cache will use to avoid over-committing
memory and thrashing when we exceed the memory limit. It's not necessary
to get it right, it just helps smooth things out by making the automatic
memory tuning less aggressive. IE if you have a 2 GB memory target and
a 512MB base, you'll never assign more than 1.5GB to the cache on the
assumption that the rest of the OSD will eventually need 512MB to
operate even if it's not using that much right now. I think you can
probably just leave it alone. What you and Dan appear to be seeing is
that this number isn't static in your case but increases over time any
way. Eventually I'm hoping that we can automatically account for more
and more of that memory by reading the data from the mempools.
Q2) The cache ratio options are shown under the section "Manual Cache Sizing". Do they also apply when cache auto tuning is enabled? If so, is it worth changing these defaults for higher values of osd_memory_cache_min?
They actually do have an effect on the automatic cache sizing and
probably shouldn't only be under the manual section. When you have the
automatic cache sizing enabled, those options will affect the "fair
share" values of the different caches at each cache priority level. IE
at priority level 0, if both caches want more memory than is available,
those ratios will determine how much each cache gets. If there is more
memory available than requested, each cache gets as much as they want
and we move on to the next priority level and do the same thing again.
So in this case the ratios end up being sort of more like fallback
settings for when you don't have enough memory to fulfill all cache
requests at a given priority level, but otherwise are not utilized until
we hit that limit. The goal with this scheme is to make sure that "high
priority" items in each cache get first dibs at the memory even if it
might skew the ratios. This might be things like rocksdb bloom filters
and indexes, or potentially very recent hot items in one cache vs very
old items in another cache. The ratios become more like guidelines than
hard limits.
When you change to manual mode, you set an overall bluestore cache size
and each cache gets a flat percentage of it based on the ratios. With
0.4/0.4 you will always have 40% for onode, 40% for omap, and 20% for
data even if one of those caches does not use all of it's memory.
Many thanks for your help with this. I can't find answers to these questions in the docs.
There might be two reasons for high osd_map memory usage. One is, that our OSDs seem to hold a large number of OSD maps:
I brought this up in our core team standup last week. Not sure if
anyone has had time to look at it yet though.
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx