Re: Ceph, container and memory

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 3/5/19 11:51 AM, Patrick Donnelly wrote:
On Tue, Mar 5, 2019 at 8:41 AM Sage Weil <sage@xxxxxxxxxxxx> wrote:
If memory.requests is omitted for a container, it defaults to limits.
If memory.limits is not set, it defaults to 0 (unbounded).
If none of these 2 are specified then we don't tune anything because
we don't really know what to do.

So far I've collected a couple of Ceph flags that are worth tuning:

* mds_cache_memory_limit
* osd_memory_target

These flags will be passed at instantiation time for the MDS and the OSD daemon.
Since most of the daemons have some cache flag, it'll be nice to unify
them with a new option --{daemon}-memory-target.
Currently I'm exposing POD properties via env var too that Ceph can
consume later for more autotuning (POD_{MEMORY,CPU}_LIMIT,
POD_{CPU,MEMORY}_REQUEST.
Ignoring mds_cache_memory_limit for now; I think we should wait until we
have mds_memory_target before doing any magic there.

For the osd_memory_target, though, I think we could make the OSD pick up
on the POD_MEMORY_REQUEST variable and, if present, set osd_memory_target
to that value.  Or, instead of putting the burden on ceph, simply have
rook pass --osd-memory-target on the command line, or (post-startup) do
'ceph daemon osd.N config set osd_memory_target ...'.  (The advantage of
the latter is that it can more easily be overridden at runtime.)
Is POD_MEMORY_LIMIT|REQUEST standardized somewhere? Using an
environment variable to communicate resource restrictions is useful
but also hard to change on-the-fly. Can we (Ceph) read this
information from the cgroup the Ceph daemon has been assigned to?
Reducing the amount of configuration is one of our goals so if we can
make Ceph more aware of its environment as far as resource
constraints, we should go that route.

The MDS should self-configure mds_cache_memory_limit based on
memory.requests. That takes the magic formula out of the hands of
users and forgetful devs :)

I'm not sure we have any specific action on the POD_MEMORY_LIMIT value..
the OSD should really be aiming for the REQUEST value instead.
I agree we should focus on memory.requests.


I mentioned it in another doc, but I suspect it would be fairly easy to adapt the osd_memory_limit and autotuner code to work in the mds so long as we can adjust mds_cache_memory_limit on the fly.  It would be really nice to have all of the daemons conform to standard *_memory_limit interface.  That's useful both inside a container environment and on bare metal.


Mark




[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux