Ceph, container and memory

Sebastien Han <shan@xxxxxxxxxx> · Mon, 4 Mar 2019 22:22:47 +0100

Hi,

I'm writing this because I'm currently implementing memory tuning in
Rook. The implementation is mostly based on the following POD
properties:

* memory.limit: defines a hard cap for the memory, if the container
tries to allocate more memory than the specified limit it gets
terminated.
* memory.request: used for scheduling only (and for OOM strategy when
applying QoS)

If memory.requests is omitted for a container, it defaults to limits.
If memory.limits is not set, it defaults to 0 (unbounded).
If none of these 2 are specified then we don't tune anything because
we don't really know what to do.

So far I've collected a couple of Ceph flags that are worth tuning:

* mds_cache_memory_limit
* osd_memory_target

These flags will be passed at instantiation time for the MDS and the OSD daemon.
Since most of the daemons have some cache flag, it'll be nice to unify
them with a new option --{daemon}-memory-target.
Currently I'm exposing POD properties via env var too that Ceph can
consume later for more autotuning (POD_{MEMORY,CPU}_LIMIT,
POD_{CPU,MEMORY}_REQUEST.

One other cool thing will be to report (when containerized) that the
daemon cgroup memory limit is closed, so send something on "ceph -s"
or ceph could re-adjust some of its internal values.
As part, of that PR I'm also implementing failures based on
memory.limit per daemon. So I need to know what's the minimum amount
of memory we want to recommend in production. It's not an easy thing
to do but we have to start somewhere.

Thanks!
–––––––––
Sébastien Han
Principal Software Engineer, Storage Architect

"Always give 100%. Unless you're giving blood."