Memory leak in Ceph OSD?

Stefan Kooman <stefan@xxxxxx> · Wed, 28 Feb 2018 16:18:15 +0100

Hi,

TL;DR: we see "used" memory grows indefinitely on our OSD servers.
Until the point that either 1) a OSD process gets killed by OOMkiller,
or 2) OSD aborts (proably because malloc cannot provide more RAM). I
suspect a memory leak of the OSDs.

We were running 12.2.2. We are now running 12.2.3. Replicated setup,
SIZE=3, MIN_SIZE=2. All servers were rebooted. The "used" memory is
slowly, but steadily growing.

ceph.conf:
bluestore_cache_size=6G

ceph daemon osd.$daemon dump_mempools info gives:

    "total": {
        "items": 52925781,
        "bytes": 6058227868

... for roughly all OSDs. So the OSD process is not "exceeding" what it
*thinks* it's using.

We haven't noticed this during the "pre-production" phase of the cluster. Main
difference with "pre-production" and "production" is that we are using
"compression" on the pool.

ceph osd pool set $pool compression_algorithm snappy
ceph osd pool set $pool compression_mode aggressive

I haven't seen any of you complaining about memory leaks besides the well know
leak in 12.2.1. How many of you are using compression like this? If it has
anything to do with this at all ...

Currently at ~ 60 GB used with 2 days uptime. 42 GB of RAM usage for all OSDs
... 18 GB leaked?

If Ceph keeps releasing minor versions so quickly it will never really become a
big problem ;-).

Any hints to analyse this issue?

Gr. Stefan

-- 
| BIT BV  http://www.bit.nl/        Kamer van Koophandel 09090351
| GPG: 0xD14839C6                   +31 318 648 688 / info@xxxxxx
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com