Sebastien, I just had to restart the OSD about 10 minutes ago, so it looks like all it did was slow down the process. Dave Spano ----- Original Message ----- From: "Sébastien Han" <han.sebastien@xxxxxxxxx> To: "Dave Spano" <dspano@xxxxxxxxxxxxxx> Cc: "Greg Farnum" <greg@xxxxxxxxxxx>, "ceph-devel" <ceph-devel@xxxxxxxxxxxxxxx>, "Sage Weil" <sage@xxxxxxxxxxx>, "Wido den Hollander" <wido@xxxxxxxx>, "Sylvain Munaut" <s.munaut@xxxxxxxxxxxxxxxxxxxx>, "Samuel Just" <sam.just@xxxxxxxxxxx>, "Vladislav Gorbunov" <vadikgo@xxxxxxxxx> Sent: Wednesday, March 13, 2013 3:59:03 PM Subject: Re: OSD memory leaks? Dave, Just to be sure, did the log max recent=10000 _completely_ stod the memory leak or did it slow it down? Thanks! -- Regards, Sébastien Han. On Wed, Mar 13, 2013 at 2:12 PM, Dave Spano <dspano@xxxxxxxxxxxxxx> wrote: > Lol. I'm totally fine with that. My glance images pool isn't used too often. I'm going to give that a try today and see what happens. > > I'm still crossing my fingers, but since I added log max recent=10000 to ceph.conf, I've been okay despite the improper pg_num, and a lot of scrubbing/deep scrubbing yesterday. > > Dave Spano > > > > > ----- Original Message ----- > > From: "Greg Farnum" <greg@xxxxxxxxxxx> > To: "Dave Spano" <dspano@xxxxxxxxxxxxxx> > Cc: "ceph-devel" <ceph-devel@xxxxxxxxxxxxxxx>, "Sage Weil" <sage@xxxxxxxxxxx>, "Wido den Hollander" <wido@xxxxxxxx>, "Sylvain Munaut" <s.munaut@xxxxxxxxxxxxxxxxxxxx>, "Samuel Just" <sam.just@xxxxxxxxxxx>, "Vladislav Gorbunov" <vadikgo@xxxxxxxxx>, "Sébastien Han" <han.sebastien@xxxxxxxxx> > Sent: Tuesday, March 12, 2013 5:37:37 PM > Subject: Re: OSD memory leaks? > > Yeah. There's not anything intelligent about that cppool mechanism. :) > -Greg > > On Tuesday, March 12, 2013 at 2:15 PM, Dave Spano wrote: > >> I'd rather shut the cloud down and copy the pool to a new one than take any chances of corruption by using an experimental feature. My guess is that there cannot be any i/o to the pool while copying, otherwise you'll lose the changes that are happening during the copy, correct? >> >> Dave Spano >> Optogenics >> Systems Administrator >> >> >> >> ----- Original Message ----- >> >> From: "Greg Farnum" <greg@xxxxxxxxxxx (mailto:greg@xxxxxxxxxxx)> >> To: "Sébastien Han" <han.sebastien@xxxxxxxxx (mailto:han.sebastien@xxxxxxxxx)> >> Cc: "Dave Spano" <dspano@xxxxxxxxxxxxxx (mailto:dspano@xxxxxxxxxxxxxx)>, "ceph-devel" <ceph-devel@xxxxxxxxxxxxxxx (mailto:ceph-devel@xxxxxxxxxxxxxxx)>, "Sage Weil" <sage@xxxxxxxxxxx (mailto:sage@xxxxxxxxxxx)>, "Wido den Hollander" <wido@xxxxxxxx (mailto:wido@xxxxxxxx)>, "Sylvain Munaut" <s.munaut@xxxxxxxxxxxxxxxxxxxx (mailto:s.munaut@xxxxxxxxxxxxxxxxxxxx)>, "Samuel Just" <sam.just@xxxxxxxxxxx (mailto:sam.just@xxxxxxxxxxx)>, "Vladislav Gorbunov" <vadikgo@xxxxxxxxx (mailto:vadikgo@xxxxxxxxx)> >> Sent: Tuesday, March 12, 2013 4:20:13 PM >> Subject: Re: OSD memory leaks? >> >> On Tuesday, March 12, 2013 at 1:10 PM, Sébastien Han wrote: >> > Well to avoid un necessary data movement, there is also an >> > _experimental_ feature to change on fly the number of PGs in a pool. >> > >> > ceph osd pool set <poolname> pg_num <numpgs> --allow-experimental-feature >> Don't do that. We've got a set of 3 patches which fix bugs we know about that aren't in bobtail yet, and I'm sure there's more we aren't aware of… >> -Greg >> >> Software Engineer #42 @ http://inktank.com | http://ceph.com >> >> > >> > Cheers! >> > -- >> > Regards, >> > Sébastien Han. >> > >> > >> > On Tue, Mar 12, 2013 at 7:09 PM, Dave Spano <dspano@xxxxxxxxxxxxxx (mailto:dspano@xxxxxxxxxxxxxx)> wrote: >> > > Disregard my previous question. I found my answer in the post below. Absolutely brilliant! I thought I was screwed! >> > > >> > > http://permalink.gmane.org/gmane.comp.file-systems.ceph.devel/8924 >> > > >> > > Dave Spano >> > > Optogenics >> > > Systems Administrator >> > > >> > > >> > > >> > > ----- Original Message ----- >> > > >> > > From: "Dave Spano" <dspano@xxxxxxxxxxxxxx (mailto:dspano@xxxxxxxxxxxxxx)> >> > > To: "Sébastien Han" <han.sebastien@xxxxxxxxx (mailto:han.sebastien@xxxxxxxxx)> >> > > Cc: "Sage Weil" <sage@xxxxxxxxxxx (mailto:sage@xxxxxxxxxxx)>, "Wido den Hollander" <wido@xxxxxxxx (mailto:wido@xxxxxxxx)>, "Gregory Farnum" <greg@xxxxxxxxxxx (mailto:greg@xxxxxxxxxxx)>, "Sylvain Munaut" <s.munaut@xxxxxxxxxxxxxxxxxxxx (mailto:s.munaut@xxxxxxxxxxxxxxxxxxxx)>, "ceph-devel" <ceph-devel@xxxxxxxxxxxxxxx (mailto:ceph-devel@xxxxxxxxxxxxxxx)>, "Samuel Just" <sam.just@xxxxxxxxxxx (mailto:sam.just@xxxxxxxxxxx)>, "Vladislav Gorbunov" <vadikgo@xxxxxxxxx (mailto:vadikgo@xxxxxxxxx)> >> > > Sent: Tuesday, March 12, 2013 1:41:21 PM >> > > Subject: Re: OSD memory leaks? >> > > >> > > >> > > If one were stupid enough to have their pg_num and pgp_num set to 8 on two of their pools, how could you fix that? >> > > >> > > >> > > Dave Spano >> > > >> > > >> > > >> > > ----- Original Message ----- >> > > >> > > From: "Sébastien Han" <han.sebastien@xxxxxxxxx (mailto:han.sebastien@xxxxxxxxx)> >> > > To: "Vladislav Gorbunov" <vadikgo@xxxxxxxxx (mailto:vadikgo@xxxxxxxxx)> >> > > Cc: "Sage Weil" <sage@xxxxxxxxxxx (mailto:sage@xxxxxxxxxxx)>, "Wido den Hollander" <wido@xxxxxxxx (mailto:wido@xxxxxxxx)>, "Gregory Farnum" <greg@xxxxxxxxxxx (mailto:greg@xxxxxxxxxxx)>, "Sylvain Munaut" <s.munaut@xxxxxxxxxxxxxxxxxxxx (mailto:s.munaut@xxxxxxxxxxxxxxxxxxxx)>, "Dave Spano" <dspano@xxxxxxxxxxxxxx (mailto:dspano@xxxxxxxxxxxxxx)>, "ceph-devel" <ceph-devel@xxxxxxxxxxxxxxx (mailto:ceph-devel@xxxxxxxxxxxxxxx)>, "Samuel Just" <sam.just@xxxxxxxxxxx (mailto:sam.just@xxxxxxxxxxx)> >> > > Sent: Tuesday, March 12, 2013 9:43:44 AM >> > > Subject: Re: OSD memory leaks? >> > > >> > > > Sorry, i mean pg_num and pgp_num on all pools. Shown by the "ceph osd >> > > > dump | grep 'rep size'" >> > > >> > > >> > > >> > > >> > > >> > > Well it's still 450 each... >> > > >> > > > The default pg_num value 8 is NOT suitable for big cluster. >> > > >> > > Thanks I know, I'm not new with Ceph. What's your point here? I >> > > already said that pg_num was 450... >> > > -- >> > > Regards, >> > > Sébastien Han. >> > > >> > > >> > > On Tue, Mar 12, 2013 at 2:00 PM, Vladislav Gorbunov <vadikgo@xxxxxxxxx (mailto:vadikgo@xxxxxxxxx)> wrote: >> > > > Sorry, i mean pg_num and pgp_num on all pools. Shown by the "ceph osd >> > > > dump | grep 'rep size'" >> > > > The default pg_num value 8 is NOT suitable for big cluster. >> > > > >> > > > 2013/3/13 Sébastien Han <han.sebastien@xxxxxxxxx (mailto:han.sebastien@xxxxxxxxx)>: >> > > > > Replica count has been set to 2. >> > > > > >> > > > > Why? >> > > > > -- >> > > > > Regards, >> > > > > Sébastien Han. >> > > > > >> > > > > >> > > > > On Tue, Mar 12, 2013 at 12:45 PM, Vladislav Gorbunov <vadikgo@xxxxxxxxx (mailto:vadikgo@xxxxxxxxx)> wrote: >> > > > > > > FYI I'm using 450 pgs for my pools. >> > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > > Please, can you show the number of object replicas? >> > > > > > >> > > > > > ceph osd dump | grep 'rep size' >> > > > > > >> > > > > > Vlad Gorbunov >> > > > > > >> > > > > > 2013/3/5 Sébastien Han <han.sebastien@xxxxxxxxx (mailto:han.sebastien@xxxxxxxxx)>: >> > > > > > > FYI I'm using 450 pgs for my pools. >> > > > > > > >> > > > > > > -- >> > > > > > > Regards, >> > > > > > > Sébastien Han. >> > > > > > > >> > > > > > > >> > > > > > > On Fri, Mar 1, 2013 at 8:10 PM, Sage Weil <sage@xxxxxxxxxxx (mailto:sage@xxxxxxxxxxx)> wrote: >> > > > > > > > >> > > > > > > > On Fri, 1 Mar 2013, Wido den Hollander wrote: >> > > > > > > > > On 02/23/2013 01:44 AM, Sage Weil wrote: >> > > > > > > > > > On Fri, 22 Feb 2013, S?bastien Han wrote: >> > > > > > > > > > > Hi all, >> > > > > > > > > > > >> > > > > > > > > > > I finally got a core dump. >> > > > > > > > > > > >> > > > > > > > > > > I did it with a kill -SEGV on the OSD process. >> > > > > > > > > > > >> > > > > > > > > > > https://www.dropbox.com/s/ahv6hm0ipnak5rf/core-ceph-osd-11-0-0-20100-1361539008 >> > > > > > > > > > > >> > > > > > > > > > > Hope we will get something out of it :-). >> > > > > > > > > > >> > > > > > > > > > AHA! We have a theory. The pg log isnt trimmed during scrub (because teh >> > > > > > > > > > old scrub code required that), but the new (deep) scrub can take a very >> > > > > > > > > > long time, which means the pg log will eat ram in the meantime.. >> > > > > > > > > > especially under high iops. >> > > > > > > > > >> > > > > > > > > >> > > > > > > > > >> > > > > > > > > >> > > > > > > > > >> > > > > > > > > Does the number of PGs influence the memory leak? So my theory is that when >> > > > > > > > > you have a high number of PGs with a low number of objects per PG you don't >> > > > > > > > > see the memory leak. >> > > > > > > > > >> > > > > > > > > I saw the memory leak on a RBD system where a pool had just 8 PGs, but after >> > > > > > > > > going to 1024 PGs in a new pool it seemed to be resolved. >> > > > > > > > > >> > > > > > > > > I've asked somebody else to try your patch since he's still seeing it on his >> > > > > > > > > systems. Hopefully that gives us some results. >> > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > > The PGs were active+clean when you saw the leak? There is a problem (that >> > > > > > > > we just fixed in master) where pg logs aren't trimmed for degraded PGs. >> > > > > > > > >> > > > > > > > sage >> > > > > > > > >> > > > > > > > > >> > > > > > > > > Wido >> > > > > > > > > >> > > > > > > > > > Can you try wip-osd-log-trim (which is bobtail + a simple patch) and see >> > > > > > > > > > if that seems to work? Note that that patch shouldn't be run in a mixed >> > > > > > > > > > argonaut+bobtail cluster, since it isn't properly checking if the scrub is >> > > > > > > > > > class or chunky/deep. >> > > > > > > > > > >> > > > > > > > > > Thanks! >> > > > > > > > > > sage >> > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > > > -- >> > > > > > > > > > > Regards, >> > > > > > > > > > > S?bastien Han. >> > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > > On Fri, Jan 11, 2013 at 7:13 PM, Gregory Farnum <greg@xxxxxxxxxxx (mailto:greg@xxxxxxxxxxx)> wrote: >> > > > > > > > > > > > On Fri, Jan 11, 2013 at 6:57 AM, S?bastien Han <han.sebastien@xxxxxxxxx (mailto:han.sebastien@xxxxxxxxx)> >> > > > > > > > > > > > wrote: >> > > > > > > > > > > > > > Is osd.1 using the heap profiler as well? Keep in mind that active >> > > > > > > > > > > > > > use >> > > > > > > > > > > > > > of the memory profiler will itself cause memory usage to increase ? >> > > > > > > > > > > > > > this sounds a bit like that to me since it's staying stable at a >> > > > > > > > > > > > > > large >> > > > > > > > > > > > > > but finite portion of total memory. >> > > > > > > > > > > > > >> > > > > > > > > > > > > >> > > > > > > > > > > > > >> > > > > > > > > > > > > >> > > > > > > > > > > > > >> > > > > > > > > > > > > Well, the memory consumption was already high before the profiler was >> > > > > > > > > > > > > started. So yes with the memory profiler enable an OSD might consume >> > > > > > > > > > > > > more memory but this doesn't cause the memory leaks. >> > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > > My concern is that maybe you saw a leak but when you restarted with >> > > > > > > > > > > > the memory profiling you lost whatever conditions caused it. >> > > > > > > > > > > > >> > > > > > > > > > > > > Any ideas? Nothing to say about my scrumbing theory? >> > > > > > > > > > > > I like it, but Sam indicates that without some heap dumps which >> > > > > > > > > > > > capture the actual leak then scrub is too large to effectively code >> > > > > > > > > > > > review for leaks. :( >> > > > > > > > > > > > -Greg >> > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > > -- >> > > > > > > > > > > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >> > > > > > > > > > > the body of a message to majordomo@xxxxxxxxxxxxxxx (mailto:majordomo@xxxxxxxxxxxxxxx) >> > > > > > > > > > > More majordomo info at http://vger.kernel.org/majordomo-info.html >> > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > > -- >> > > > > > > > > > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >> > > > > > > > > > the body of a message to majordomo@xxxxxxxxxxxxxxx (mailto:majordomo@xxxxxxxxxxxxxxx) >> > > > > > > > > > More majordomo info at http://vger.kernel.org/majordomo-info.html >> > > > > > > > > >> > > > > > > > > >> > > > > > > > > >> > > > > > > > > >> > > > > > > > > >> > > > > > > > > >> > > > > > > > > -- >> > > > > > > > > Wido den Hollander >> > > > > > > > > 42on B.V. >> > > > > > > > > >> > > > > > > > > Phone: +31 (0)20 700 9902 >> > > > > > > > > Skype: contact42on >> > > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > -- >> > > > > > > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >> > > > > > > the body of a message to majordomo@xxxxxxxxxxxxxxx (mailto:majordomo@xxxxxxxxxxxxxxx) >> > > > > > > More majordomo info at http://vger.kernel.org/majordomo-info.html >> > > > > > >> > > > > >> > > > >> > > >> > > >> > > >> > > >> > > -- >> > > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >> > > the body of a message to majordomo@xxxxxxxxxxxxxxx (mailto:majordomo@xxxxxxxxxxxxxxx) >> > > More majordomo info at http://vger.kernel.org/majordomo-info.html >> > >> -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html