Re: PROBLEM: Memory leaking when running kubernetes cronjobs

Daniel McGinnes <MCGINNES@xxxxxxxxxx> · Thu, 20 Sep 2018 18:28:13 +0100

Hi Roman,

yeah, from what I could see docker/kube don't support cgroups v2 yet, it 
would be great if you could help with a patch for debugging.

I ran stress in parallel with the workload. drop_caches cleared some also, 
but a lot was left leaked still even after that.

My plan is to let it run over the weekend as it is, so I can do a direct 
comparison with the run without the patches. Monday I'll drop_caches, then 
create some more regular ambient memory pressure and see what happens.

thanks,

Dan McGinnes

IBM Cloud - Containers performance

Int Tel: 247359        Ext Tel: 01962 817359

Notes: Daniel McGinnes/UK/IBM
Email: MCGINNES@xxxxxxxxxx

IBM (UK) Ltd, Hursley Park,Winchester,Hampshire, SO21 2JN

From:   Roman Gushchin <guro@xxxxxx>
To:     Daniel McGinnes <MCGINNES@xxxxxxxxxx>
Cc:     "cgroups@xxxxxxxxxxxxxxx" <cgroups@xxxxxxxxxxxxxxx>, Nathaniel 
Rockwell <nrockwell@xxxxxxxxxx>
Date:   20/09/2018 17:38
Subject:        Re: PROBLEM: Memory leaking when running kubernetes 
cronjobs

On Thu, Sep 20, 2018 at 08:23:06AM +0000, Daniel McGinnes wrote:
> Hi Roman,
> 
> unfortunately Kubernetes seems to be using version 1 cgroups, so I can't 

> see that stat - I'll investigate if there's a way to get Kube to use V2 
so 
> we can check this..

Hi Daniel!

Yeah, it might be not so easy, AFAIK.
Alternatively, you can expose this cgroup v2 data in v1 interface
using an off-stream patch, just for debugging. Should be pretty
straightforward; I can help with it, if necessary.

> 
> There wasn't memory pressure, I just run it in a pretty controlled way 
> when running the test - so initially it sounds like what I saw was 
> expected. I then ran stress --vm 16 --vm-bytes 2147483648 which did 
create 
> some memory pressure and I saw oom killer getting invoked - it seemed 
> pretty similar behaviour to before where only a small amount of the 
"lost" 
> memory was reclaimed... Maybe I was being too severe with stress and the 

> memory would be reclaimed at a slower rate under more reasonable memory 
> pressure?

So, did you run the stress -vm after the main workload or in parallel?
Can you, please, try to create some ambient memory pressure?
Does echo 3 > /proc/sys/vm/drop_caches help to reclaim the memory?

Thanks!

Roman

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU