Re: OSD memory leaks?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 02/23/2013 01:44 AM, Sage Weil wrote:
On Fri, 22 Feb 2013, S?bastien Han wrote:
Hi all,

I finally got a core dump.

I did it with a kill -SEGV on the OSD process.

https://www.dropbox.com/s/ahv6hm0ipnak5rf/core-ceph-osd-11-0-0-20100-1361539008

Hope we will get something out of it :-).

AHA!  We have a theory.  The pg log isnt trimmed during scrub (because teh
old scrub code required that), but the new (deep) scrub can take a very
long time, which means the pg log will eat ram in the meantime..
especially under high iops.


Does the number of PGs influence the memory leak? So my theory is that when you have a high number of PGs with a low number of objects per PG you don't see the memory leak.

I saw the memory leak on a RBD system where a pool had just 8 PGs, but after going to 1024 PGs in a new pool it seemed to be resolved.

I've asked somebody else to try your patch since he's still seeing it on his systems. Hopefully that gives us some results.

Wido

Can you try wip-osd-log-trim (which is bobtail + a simple patch) and see
if that seems to work?  Note that that patch shouldn't be run in a mixed
argonaut+bobtail cluster, since it isn't properly checking if the scrub is
class or chunky/deep.

Thanks!
sage


  > --
Regards,
S?bastien Han.


On Fri, Jan 11, 2013 at 7:13 PM, Gregory Farnum <greg@xxxxxxxxxxx> wrote:
On Fri, Jan 11, 2013 at 6:57 AM, S?bastien Han <han.sebastien@xxxxxxxxx> wrote:
Is osd.1 using the heap profiler as well? Keep in mind that active use
of the memory profiler will itself cause memory usage to increase ?
this sounds a bit like that to me since it's staying stable at a large
but finite portion of total memory.

Well, the memory consumption was already high before the profiler was
started. So yes with the memory profiler enable an OSD might consume
more memory but this doesn't cause the memory leaks.

My concern is that maybe you saw a leak but when you restarted with
the memory profiling you lost whatever conditions caused it.

Any ideas? Nothing to say about my scrumbing theory?
I like it, but Sam indicates that without some heap dumps which
capture the actual leak then scrub is too large to effectively code
review for leaks. :(
-Greg
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



--
Wido den Hollander
42on B.V.

Phone: +31 (0)20 700 9902
Skype: contact42on
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux