Re: OSD memory leak?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 7/20/20 3:23 AM, Frank Schilder wrote:
Dear Mark and Dan,

I'm in the process of restarting all OSDs and could use some quick advice on bluestore cache settings. My plan is to set higher minimum values and deal with accumulated excess usage via regular restarts. Looking at the documentation (https://docs.ceph.com/docs/mimic/rados/configuration/bluestore-config-ref/), I find the following relevant options (with defaults):

# Automatic Cache Sizing
osd_memory_target {4294967296} # 4GB
osd_memory_base {805306368} # 768MB
osd_memory_cache_min {134217728} # 128MB

# Manual Cache Sizing
bluestore_cache_meta_ratio {.4} # 40% ?
bluestore_cache_kv_ratio {.4} # 40% ?
bluestore_cache_kv_max {512 * 1024*1024} # 512MB

Q1) If I increase osd_memory_cache_min, should I also increase osd_memory_base by the same or some other amount?


osd_memory_base is a hint at how much memory the OSD could consume outside the cache once it's reached steady state.  It basically sets a hard cap on how much memory the cache will use to avoid over-committing memory and thrashing when we exceed the memory limit. It's not necessary to get it right, it just helps smooth things out by making the automatic memory tuning less aggressive.  IE if you have a 2 GB memory target and a 512MB base, you'll never assign more than 1.5GB to the cache on the assumption that the rest of the OSD will eventually need 512MB to operate even if it's not using that much right now.  I think you can probably just leave it alone.  What you and Dan appear to be seeing is that this number isn't static in your case but increases over time any way.  Eventually I'm hoping that we can automatically account for more and more of that memory by reading the data from the mempools.

Q2) The cache ratio options are shown under the section "Manual Cache Sizing". Do they also apply when cache auto tuning is enabled? If so, is it worth changing these defaults for higher values of osd_memory_cache_min?


They actually do have an effect on the automatic cache sizing and probably shouldn't only be under the manual section.  When you have the automatic cache sizing enabled, those options will affect the "fair share" values of the different caches at each cache priority level.  IE at priority level 0, if both caches want more memory than is available, those ratios will determine how much each cache gets.  If there is more memory available than requested, each cache gets as much as they want and we move on to the next priority level and do the same thing again.  So in this case the ratios end up being sort of more like fallback settings for when you don't have enough memory to fulfill all cache requests at a given priority level, but otherwise are not utilized until we hit that limit.  The goal with this scheme is to make sure that "high priority" items in each cache get first dibs at the memory even if it might skew the ratios.  This might be things like rocksdb bloom filters and indexes, or potentially very recent hot items in one cache vs very old items in another cache.  The ratios become more like guidelines than hard limits.


When you change to manual mode, you set an overall bluestore cache size and each cache gets a flat percentage of it based on the ratios.  With 0.4/0.4 you will always have 40% for onode, 40% for omap, and 20% for data even if one of those caches does not use all of it's memory.



Many thanks for your help with this. I can't find answers to these questions in the docs.

There might be two reasons for high osd_map memory usage. One is, that our OSDs seem to hold a large number of OSD maps:


I brought this up in our core team standup last week.  Not sure if anyone has had time to look at it yet though.
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux