Re: mds and mon memory targets

Joao Eduardo Luis <joao@xxxxxxx> · Wed, 31 Oct 2018 22:14:03 +0000

On 10/31/2018 07:57 PM, Sage Weil wrote:
> On Wed, 31 Oct 2018, Joao Eduardo Luis wrote:
>> If it turns out that we can indeed adjust the rocksdb cache, then my
>> point is moot, and we can even aim for the stretch goal of having two
>> settings: the default and the max; start with default, and allow the mon
>> to suppress warnings until max is almost reached while increasing the
>> cache sizes. That should allow us to somewhat keep small/medium clusters
>> under control, while not having those larger clusters being bombarded
>> with memory warnings until the maximum is near.
> 
> I'm not sure I'm following this.  If we set mon_target_memory = 2G, say, 
> by default, it will increase to 2G RSS as soon as the rocksdb database 
> size gets that big (since it'll just keep it all in RAM) or enough OSDMaps 
> go by that we keep them all in memory.  I'm not sure if/how we can have a 
> separate max come in unless we have some way to detect if we are 
> "thrashing" such that adding cache will help us significantly...  I was 
> thinking we'd pick some numbers semi-arbitrarily (e.g., OSDMap size 
> * 1000) to extrapolate a minimal amount of memory we want, and warn if the 
> target is below that.  (We can just pick out a few large clusters and 
> compare their map sizes to rocksdb sizes, perhaps, to pick these 
> constant?)

Generally speaking, I'm not too concerned about the monitor's memory
consumption under normal operation - we do bound the number of maps we
keep, although sizes can be a problem. What you suggest would certainly
help with this, and make it so that we get a ballpark value depending on
cluster size.

But I think this can be useful for times when the cluster is unclean
though, and we are keeping more maps than we usually would. The osdmap
pruning mechanism will help keeping a smaller number of maps in the
store, but we'll still be keeping them in the osdmap cache. And as the
number of maps increase, having a max value to which we can increase the
caches sizes could be helpful - and then, maybe, reduce it once we're
back to normal operation. Extrapolation could still be similar, as a
function of the number of tracked maps times their sizes.

But yeah, the thrashing detection would be annoying, and it would have
to be, at best, a guess based on the maps we hold, the maps we think we
should hold, and maybe even factoring in the health.

  -Joao