Developing best-practices around Ceph daemons and kubernetes memory limits

Blaine Gardner <BlGardner@xxxxxxxx> · Thu, 26 Mar 2020 16:31:29 +0000

I am trying to develop some best practices around setting Kubernetes Pod Memory Requests and Memory Limits for Ceph daemons. 

Setting a Pod Memory Request will control how Kubernetes schedules a pod. Setting a Pod Memory Limit will mean that the container may be killed if it exceeds the limit. 

Advice I got from Joao: In the case of Ceph monitors, they are more likely to be experiencing memory over-use during recovery scenarios, and killing mons during this due to exceeding a limit may make the problem much worse. The best-practice I have here is to only set memory requests for Ceph mons, ideally 4GB.

In the case of OSDs, things are a little more complex. OSDs will read the POD_MEMORY_REQUEST and POD_MEMORY_LIMIT environment variables which are set by Rook inside Kubernetes pods, and OSD will tune their memory usage to meet this. They will target the minimum between POD_MEMORY_REQUEST and [POD_MEMORY_LIMIT * 0.8]. OSDs to my understanding aggressively try to stay within their targets. What are the risks of setting (or not setting) Pod Memory Limits on OSDs knowing that if the limit is set too low or if the OSDs begin to memory leak, they will be terminated and restarted by Kubernetes?
  - One risk I can imagine is that if OSDs are all started at nearly the same time and experience similar loads, they might be likely to leak memory at similar rates and be killed by Kubernetes at about the same time. Stampeding herds of OSD memory leaks followed by memory limit terminations might occur which could ripple to causing other OSDs to become unstable.
  - Not setting a limit might mean that OSDs experience memory leak and cause OOM situations for other daemons or for the Kubernetes kubelet if the system settings don't guarantee kubelet some amount of resources.

What are the risks of killing other daemons past a particular limit? Is it good to kill daemons if they exceed a limit in order to prevent memory leaks from affecting the rest of the system? MDS? RGW? MGR? NFS-Ganesha?

If anyone has knowledgeable recommendations about any daemons, I'd love your input. Please reply-all so that I get replies straight to my inbox.
Blaine
_______________________________________________
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx