Re: Memory leak in Ceph OSD?

Igor Fedotov <ifedotov@xxxxxxx> · Wed, 28 Feb 2018 18:24:33 +0300

Hi Stefan,

can you disable compression and check if memory is still leaking.

If it stops then the issue is definitely somewhere along the "compress" 
path.

Thanks,

Igor

On 2/28/2018 6:18 PM, Stefan Kooman wrote:
Hi,

TL;DR: we see "used" memory grows indefinitely on our OSD servers.
Until the point that either 1) a OSD process gets killed by OOMkiller,
or 2) OSD aborts (proably because malloc cannot provide more RAM). I
suspect a memory leak of the OSDs.

We were running 12.2.2. We are now running 12.2.3. Replicated setup,
SIZE=3, MIN_SIZE=2. All servers were rebooted. The "used" memory is
slowly, but steadily growing.

ceph.conf:
bluestore_cache_size=6G

ceph daemon osd.$daemon dump_mempools info gives:

     "total": {
         "items": 52925781,
         "bytes": 6058227868

... for roughly all OSDs. So the OSD process is not "exceeding" what it
*thinks* it's using.

We haven't noticed this during the "pre-production" phase of the cluster. Main
difference with "pre-production" and "production" is that we are using
"compression" on the pool.

ceph osd pool set $pool compression_algorithm snappy
ceph osd pool set $pool compression_mode aggressive

I haven't seen any of you complaining about memory leaks besides the well know
leak in 12.2.1. How many of you are using compression like this? If it has
anything to do with this at all ...

Currently at ~ 60 GB used with 2 days uptime. 42 GB of RAM usage for all OSDs
... 18 GB leaked?

If Ceph keeps releasing minor versions so quickly it will never really become a
big problem ;-).

Any hints to analyse this issue?

Gr. Stefan

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com