[0.48.3] OSD memory leak when scrubbing

Sylvain Munaut <s.munaut@xxxxxxxxxxxxxxxxxxxx> · Tue, 22 Jan 2013 21:01:30 +0100

Hi,

Since I have ceph in prod, I experienced a memory leak in the OSD
forcing to restart them every 5 or 6 days. Without that the OSD
process just grows infinitely and eventually gets killed by the OOM
killer. (To make sure it wasn't "legitimate", I left one grow up to 4G
or RSS ...).

Here's for example the RSS usage of the 12 OSDs process
http://i.imgur.com/ZJxyldq.png during a few hours.

What I've just noticed is that if I look at the logs of the osd
process right when it grows, I can see it's scrubbing PGs from pool
#3. When scrubbing PGs from other pools, nothing really happens memory
wise.

Pool #3 is the pool where I have all the RBD images for the VMs and so
have a bunch of small read/write/modify. The other pools are used by
RGW for object storage and are mostly write-once,read-many-times of
relatively large objects.

I'm planning to upgrade to 0.56.1 this week end and I was hoping to
see if someone knew if that issue had been fixed with the scrubbing
code ?

I've seen other posts about memory leaks but at the time, it wasn't
confirmed what was the source. Here I clearly see it's the scrubbing
on pools that have RBD image.

Cheers,

      Sylvain
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html