Re: mds and mon memory targets

Sage Weil <sweil@xxxxxxxxxx> · Wed, 31 Oct 2018 19:57:14 +0000 (UTC)

On Wed, 31 Oct 2018, Joao Eduardo Luis wrote:
> On 10/31/2018 05:44 PM, Sage Weil wrote:
> > For the mon, we have at least two caches we can adjust: the rocksdb cache, 
> > and mon_osd_cache_size.  Should we do the same thing there?
> > 
> > My only concern with a mon_target_memory option is that we have to set a 
> > default, and the reality is that we want the mon to use more memory for 
> > large clusters.  Perhaps we could issue a health alert if it looks like 
> > the mon is targetting too little memory for the cluster size, and then set 
> > the default for something that works well for medium-ish average clusters?
> 
> I generally like the idea, and this is the simplest approach.
> 
> I do worry about throwing more warnings at users, especially if we are
> not able to adjust those settings on-the-fly. The SimpleLRU backing the
> osdmap cache shouldn't be much of a problem, with an observer for the
> config option, but I don't think we can do that for the rocksdb
> underlying cache?

Actually we can (and do, in the OSD) adjust the rocksdb cache size on the 
fly.

> If it turns out that we can indeed adjust the rocksdb cache, then my
> point is moot, and we can even aim for the stretch goal of having two
> settings: the default and the max; start with default, and allow the mon
> to suppress warnings until max is almost reached while increasing the
> cache sizes. That should allow us to somewhat keep small/medium clusters
> under control, while not having those larger clusters being bombarded
> with memory warnings until the maximum is near.

I'm not sure I'm following this.  If we set mon_target_memory = 2G, say, 
by default, it will increase to 2G RSS as soon as the rocksdb database 
size gets that big (since it'll just keep it all in RAM) or enough OSDMaps 
go by that we keep them all in memory.  I'm not sure if/how we can have a 
separate max come in unless we have some way to detect if we are 
"thrashing" such that adding cache will help us significantly...  I was 
thinking we'd pick some numbers semi-arbitrarily (e.g., OSDMap size 
* 1000) to extrapolate a minimal amount of memory we want, and warn if the 
target is below that.  (We can just pick out a few large clusters and 
compare their map sizes to rocksdb sizes, perhaps, to pick these 
constant?)

sage