That pattern would seem to support the log trimming theory of the leak. -Sam On Fri, Mar 1, 2013 at 7:51 AM, Wido den Hollander <wido@xxxxxxxx> wrote: > On 02/23/2013 01:44 AM, Sage Weil wrote: >> >> On Fri, 22 Feb 2013, S?bastien Han wrote: >>> >>> Hi all, >>> >>> I finally got a core dump. >>> >>> I did it with a kill -SEGV on the OSD process. >>> >>> >>> https://www.dropbox.com/s/ahv6hm0ipnak5rf/core-ceph-osd-11-0-0-20100-1361539008 >>> >>> Hope we will get something out of it :-). >> >> >> AHA! We have a theory. The pg log isnt trimmed during scrub (because teh >> old scrub code required that), but the new (deep) scrub can take a very >> long time, which means the pg log will eat ram in the meantime.. >> especially under high iops. >> > > Does the number of PGs influence the memory leak? So my theory is that when > you have a high number of PGs with a low number of objects per PG you don't > see the memory leak. > > I saw the memory leak on a RBD system where a pool had just 8 PGs, but after > going to 1024 PGs in a new pool it seemed to be resolved. > > I've asked somebody else to try your patch since he's still seeing it on his > systems. Hopefully that gives us some results. > > Wido > > >> Can you try wip-osd-log-trim (which is bobtail + a simple patch) and see >> if that seems to work? Note that that patch shouldn't be run in a mixed >> argonaut+bobtail cluster, since it isn't properly checking if the scrub is >> class or chunky/deep. >> >> Thanks! >> sage >> >> >> > -- >>> >>> Regards, >>> S?bastien Han. >>> >>> >>> On Fri, Jan 11, 2013 at 7:13 PM, Gregory Farnum <greg@xxxxxxxxxxx> wrote: >>>> >>>> On Fri, Jan 11, 2013 at 6:57 AM, S?bastien Han <han.sebastien@xxxxxxxxx> >>>> wrote: >>>>>> >>>>>> Is osd.1 using the heap profiler as well? Keep in mind that active use >>>>>> of the memory profiler will itself cause memory usage to increase ? >>>>>> this sounds a bit like that to me since it's staying stable at a large >>>>>> but finite portion of total memory. >>>>> >>>>> >>>>> Well, the memory consumption was already high before the profiler was >>>>> started. So yes with the memory profiler enable an OSD might consume >>>>> more memory but this doesn't cause the memory leaks. >>>> >>>> >>>> My concern is that maybe you saw a leak but when you restarted with >>>> the memory profiling you lost whatever conditions caused it. >>>> >>>>> Any ideas? Nothing to say about my scrumbing theory? >>>> >>>> I like it, but Sam indicates that without some heap dumps which >>>> capture the actual leak then scrub is too large to effectively code >>>> review for leaks. :( >>>> -Greg >>> >>> -- >>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>> the body of a message to majordomo@xxxxxxxxxxxxxxx >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>> >>> >> -- >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >> the body of a message to majordomo@xxxxxxxxxxxxxxx >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> > > > -- > Wido den Hollander > 42on B.V. > > Phone: +31 (0)20 700 9902 > Skype: contact42on -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html