On Mon, Mar 4, 2019 at 1:23 PM Sebastien Han <shan@xxxxxxxxxx> wrote: > > Hi, > > I'm writing this because I'm currently implementing memory tuning in > Rook. The implementation is mostly based on the following POD > properties: > > * memory.limit: defines a hard cap for the memory, if the container > tries to allocate more memory than the specified limit it gets > terminated. > * memory.request: used for scheduling only (and for OOM strategy when > applying QoS) > > If memory.requests is omitted for a container, it defaults to limits. > If memory.limits is not set, it defaults to 0 (unbounded). > If none of these 2 are specified then we don't tune anything because > we don't really know what to do. > > So far I've collected a couple of Ceph flags that are worth tuning: > > * mds_cache_memory_limit > * osd_memory_target > > These flags will be passed at instantiation time for the MDS and the OSD daemon. > Since most of the daemons have some cache flag, it'll be nice to unify > them with a new option --{daemon}-memory-target. > Currently I'm exposing POD properties via env var too that Ceph can > consume later for more autotuning (POD_{MEMORY,CPU}_LIMIT, > POD_{CPU,MEMORY}_REQUEST. Hmm, these names differ for a reason. The osd_memory_target is an actual OSD target (although it's quite limited — the only real knob is the bluestore cache sizes), whereas the mds_cache_memory_limit tries to control the cache size but does not look at the total MDS memory usage. There's a formula for roughly how much memory you can expect to actually be used, but I forget what it is. > > One other cool thing will be to report (when containerized) that the > daemon cgroup memory limit is closed, so send something on "ceph -s" > or ceph could re-adjust some of its internal values. > As part, of that PR I'm also implementing failures based on > memory.limit per daemon. So I need to know what's the minimum amount > of memory we want to recommend in production. It's not an easy thing > to do but we have to start somewhere. I...think the defaults we already have are as close to a "universal" recommendation as we can get. This needs to be easy to configure since it will change based on expected use case. -Greg > > Thanks! > ––––––––– > Sébastien Han > Principal Software Engineer, Storage Architect > > "Always give 100%. Unless you're giving blood."