Re: OSD memory leak?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Dear Mark,

thank you very much for the very helpful answers. I will raise osd_memory_cache_min, leave everything else alone and watch what happens. I will report back here.

Thanks also for raising this as an issue.

Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

________________________________________
From: Mark Nelson <mnelson@xxxxxxxxxx>
Sent: 20 July 2020 15:08:11
To: Frank Schilder; Dan van der Ster
Cc: ceph-users
Subject: Re:  Re: OSD memory leak?

On 7/20/20 3:23 AM, Frank Schilder wrote:
> Dear Mark and Dan,
>
> I'm in the process of restarting all OSDs and could use some quick advice on bluestore cache settings. My plan is to set higher minimum values and deal with accumulated excess usage via regular restarts. Looking at the documentation (https://docs.ceph.com/docs/mimic/rados/configuration/bluestore-config-ref/), I find the following relevant options (with defaults):
>
> # Automatic Cache Sizing
> osd_memory_target {4294967296} # 4GB
> osd_memory_base {805306368} # 768MB
> osd_memory_cache_min {134217728} # 128MB
>
> # Manual Cache Sizing
> bluestore_cache_meta_ratio {.4} # 40% ?
> bluestore_cache_kv_ratio {.4} # 40% ?
> bluestore_cache_kv_max {512 * 1024*1024} # 512MB
>
> Q1) If I increase osd_memory_cache_min, should I also increase osd_memory_base by the same or some other amount?


osd_memory_base is a hint at how much memory the OSD could consume
outside the cache once it's reached steady state.  It basically sets a
hard cap on how much memory the cache will use to avoid over-committing
memory and thrashing when we exceed the memory limit. It's not necessary
to get it right, it just helps smooth things out by making the automatic
memory tuning less aggressive.  IE if you have a 2 GB memory target and
a 512MB base, you'll never assign more than 1.5GB to the cache on the
assumption that the rest of the OSD will eventually need 512MB to
operate even if it's not using that much right now.  I think you can
probably just leave it alone.  What you and Dan appear to be seeing is
that this number isn't static in your case but increases over time any
way.  Eventually I'm hoping that we can automatically account for more
and more of that memory by reading the data from the mempools.

> Q2) The cache ratio options are shown under the section "Manual Cache Sizing". Do they also apply when cache auto tuning is enabled? If so, is it worth changing these defaults for higher values of osd_memory_cache_min?


They actually do have an effect on the automatic cache sizing and
probably shouldn't only be under the manual section.  When you have the
automatic cache sizing enabled, those options will affect the "fair
share" values of the different caches at each cache priority level.  IE
at priority level 0, if both caches want more memory than is available,
those ratios will determine how much each cache gets.  If there is more
memory available than requested, each cache gets as much as they want
and we move on to the next priority level and do the same thing again.
So in this case the ratios end up being sort of more like fallback
settings for when you don't have enough memory to fulfill all cache
requests at a given priority level, but otherwise are not utilized until
we hit that limit.  The goal with this scheme is to make sure that "high
priority" items in each cache get first dibs at the memory even if it
might skew the ratios.  This might be things like rocksdb bloom filters
and indexes, or potentially very recent hot items in one cache vs very
old items in another cache.  The ratios become more like guidelines than
hard limits.


When you change to manual mode, you set an overall bluestore cache size
and each cache gets a flat percentage of it based on the ratios.  With
0.4/0.4 you will always have 40% for onode, 40% for omap, and 20% for
data even if one of those caches does not use all of it's memory.


>
> Many thanks for your help with this. I can't find answers to these questions in the docs.
>
> There might be two reasons for high osd_map memory usage. One is, that our OSDs seem to hold a large number of OSD maps:


I brought this up in our core team standup last week.  Not sure if
anyone has had time to look at it yet though.
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux