Possible memory leak in Ceph 14.0.1-1022.gc881d63

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,


The ceph-nano/container project offer users to use almost "master" code into containers.

We do consume the latest packages to test the future release of Ceph. Obviously that's not "supported" but interesting for testing.

In this example, the container have been built with Ceph 14.0.1-1022.gc881d63 rpms on a Centos container.

A user reported an OOM in one of his testing container (https://github.com/ceph/cn/issues/94).

Yes, the container is limited to 512MB of ram in Ceph-nano, that isn't much but wondered how much that value was the root cause of this OOM.


I bootep the same container twice and decided to monitor the memoire usage of this container in two contextes :

- an idle context as a reference

- a constant rados workload to see the impact of IOs on this issue


The Rados load was a simple loop of adding the ceph tree as objects, remove them and restart the loop.

I plotted the result (you can download it at http://pubz.free.fr/leak.png)


It appear that :

- an idle cluster, reached the 504 MB memory limit in 782 minutes. That means that's a 231 memory increase in 782 minutes; 17.7 MB/hour

- a working cluster, reached the 500 MB memory limit (when the OOM killed ceph-mon) in 692 minutes. That means that's a 229 MB memory increase in 692 minutes; 19.85MB/hour


That really looks like we do leak a lot of memory in this version and as the container is very limited in memory, put it to OOM state and die.

Does any of you have seen something similar ?


My next step is to monitor every ceph-process during that time to see what process is growing too fast.


Erwan,






[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux