+1 -- Regards, Sébastien Han. On Sat, Feb 16, 2013 at 10:09 AM, Wido den Hollander <wido@xxxxxxxx> wrote: > On 02/16/2013 08:09 AM, Andrey Korolyov wrote: >> >> Can anyone who hit this bug please confirm that your system contains libc >> 2.15+? >> > > I've seen this with 0.56.2 as well on Ubuntu 12.04. Ubuntu 12.04 comes with > 2.15-0ubuntu10.3 > > Haven't gotten around to adding a heap profiler to it. > > Wido > > >> On Tue, Feb 5, 2013 at 1:27 AM, Sébastien Han <han.sebastien@xxxxxxxxx> >> wrote: >>> >>> oh nice, the pattern also matches path :D, didn't know that >>> thanks Greg >>> -- >>> Regards, >>> Sébastien Han. >>> >>> >>> On Mon, Feb 4, 2013 at 10:22 PM, Gregory Farnum <greg@xxxxxxxxxxx> wrote: >>>> >>>> Set your /proc/sys/kernel/core_pattern file. :) >>>> http://linux.die.net/man/5/core >>>> -Greg >>>> >>>> On Mon, Feb 4, 2013 at 1:08 PM, Sébastien Han <han.sebastien@xxxxxxxxx> >>>> wrote: >>>>> >>>>> ok I finally managed to get something on my test cluster, >>>>> unfortunately, the dump goes to / >>>>> >>>>> any idea to change the destination path? >>>>> >>>>> My production / won't be big enough... >>>>> >>>>> -- >>>>> Regards, >>>>> Sébastien Han. >>>>> >>>>> >>>>> On Mon, Feb 4, 2013 at 10:03 PM, Dan Mick <dan.mick@xxxxxxxxxxx> wrote: >>>>>> >>>>>> ...and/or do you have the corepath set interestingly, or one of the >>>>>> core-trapping mechanisms turned on? >>>>>> >>>>>> >>>>>> On 02/04/2013 11:29 AM, Sage Weil wrote: >>>>>>> >>>>>>> >>>>>>> On Mon, 4 Feb 2013, S?bastien Han wrote: >>>>>>>> >>>>>>>> >>>>>>>> Hum just tried several times on my test cluster and I can't get any >>>>>>>> core dump. Does Ceph commit suicide or something? Is it expected >>>>>>>> behavior? >>>>>>> >>>>>>> >>>>>>> >>>>>>> SIGSEGV should trigger the usual path that dumps a stack trace and >>>>>>> then >>>>>>> dumps core. Was your ulimit -c set before the daemon was started? >>>>>>> >>>>>>> sage >>>>>>> >>>>>>> >>>>>>> >>>>>>>> -- >>>>>>>> Regards, >>>>>>>> S?bastien Han. >>>>>>>> >>>>>>>> >>>>>>>> On Sun, Feb 3, 2013 at 10:03 PM, S?bastien Han >>>>>>>> <han.sebastien@xxxxxxxxx> >>>>>>>> wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> Hi Lo?c, >>>>>>>>> >>>>>>>>> Thanks for bringing our discussion on the ML. I'll check that >>>>>>>>> tomorrow >>>>>>>>> :-). >>>>>>>>> >>>>>>>>> Cheer >>>>>>>>> -- >>>>>>>>> Regards, >>>>>>>>> S?bastien Han. >>>>>>>>> >>>>>>>>> >>>>>>>>> On Sun, Feb 3, 2013 at 10:01 PM, S?bastien Han >>>>>>>>> <han.sebastien@xxxxxxxxx> >>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Hi Lo?c, >>>>>>>>>> >>>>>>>>>> Thanks for bringing our discussion on the ML. I'll check that >>>>>>>>>> tomorrow >>>>>>>>>> :-). >>>>>>>>>> >>>>>>>>>> Cheers >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Regards, >>>>>>>>>> S?bastien Han. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Sun, Feb 3, 2013 at 7:17 PM, Loic Dachary <loic@xxxxxxxxxxx> >>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Hi, >>>>>>>>>>> >>>>>>>>>>> As discussed during FOSDEM, the script you wrote to kill the OSD >>>>>>>>>>> when >>>>>>>>>>> it >>>>>>>>>>> grows too much could be amended to core dump instead of just >>>>>>>>>>> being >>>>>>>>>>> killed & >>>>>>>>>>> restarted. The binary + core could probably be used to figure out >>>>>>>>>>> where the >>>>>>>>>>> leak is. >>>>>>>>>>> >>>>>>>>>>> You should make sure the OSD current working directory is in a >>>>>>>>>>> file >>>>>>>>>>> system >>>>>>>>>>> with enough free disk space to accomodate for the dump and set >>>>>>>>>>> >>>>>>>>>>> ulimit -c unlimited >>>>>>>>>>> >>>>>>>>>>> before running it ( your system default is probably ulimit -c 0 >>>>>>>>>>> which >>>>>>>>>>> inhibits core dumps ). When you detect that OSD grows too much >>>>>>>>>>> kill it >>>>>>>>>>> with >>>>>>>>>>> >>>>>>>>>>> kill -SEGV $pid >>>>>>>>>>> >>>>>>>>>>> and upload the core found in the working directory, together with >>>>>>>>>>> the >>>>>>>>>>> binary in a public place. If the osd binary is compiled with -g >>>>>>>>>>> but >>>>>>>>>>> without >>>>>>>>>>> changing the -O settings, you should have a larger binary file >>>>>>>>>>> but no >>>>>>>>>>> negative impact on performances. Forensics analysis will be made >>>>>>>>>>> a lot >>>>>>>>>>> easier with the debugging symbols. >>>>>>>>>>> >>>>>>>>>>> My 2cts >>>>>>>>>>> >>>>>>>>>>> On 01/31/2013 08:57 PM, Sage Weil wrote: >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Thu, 31 Jan 2013, Sylvain Munaut wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Hi, >>>>>>>>>>>>> >>>>>>>>>>>>> I disabled scrubbing using >>>>>>>>>>>>> >>>>>>>>>>>>>> ceph osd tell \* injectargs '--osd-scrub-min-interval 1000000' >>>>>>>>>>>>>> ceph osd tell \* injectargs '--osd-scrub-max-interval >>>>>>>>>>>>>> 10000000' >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> and the leak seems to be gone. >>>>>>>>>>>>> >>>>>>>>>>>>> See the graph at http://i.imgur.com/A0KmVot.png with the OSD >>>>>>>>>>>>> memory >>>>>>>>>>>>> for the 12 osd processes over the last 3.5 days. >>>>>>>>>>>>> Memory was rising every 24h. I did the change yesterday around >>>>>>>>>>>>> 13h00 >>>>>>>>>>>>> and OSDs stopped growing. OSD memory even seems to go down >>>>>>>>>>>>> slowly by >>>>>>>>>>>>> small blocks. >>>>>>>>>>>>> >>>>>>>>>>>>> Of course I assume disabling scrubbing is not a long term >>>>>>>>>>>>> solution >>>>>>>>>>>>> and >>>>>>>>>>>>> I should re-enable it ... (how do I do that btw ? what were the >>>>>>>>>>>>> default values for those parameters) >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> It depends on the exact commit you're on. You can see the >>>>>>>>>>>> defaults >>>>>>>>>>>> if >>>>>>>>>>>> you >>>>>>>>>>>> do >>>>>>>>>>>> >>>>>>>>>>>> ceph-osd --show-config | grep osd_scrub >>>>>>>>>>>> >>>>>>>>>>>> Thanks for testing this... I have a few other ideas to try to >>>>>>>>>>>> reproduce. >>>>>>>>>>>> >>>>>>>>>>>> sage >>>>>>>>>>>> -- >>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe >>>>>>>>>>>> ceph-devel" >>>>>>>>>>>> in >>>>>>>>>>>> the body of a message to majordomo@xxxxxxxxxxxxxxx >>>>>>>>>>>> More majordomo info at >>>>>>>>>>>> http://vger.kernel.org/majordomo-info.html >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> Lo?c Dachary, Artisan Logiciel Libre >>>>>>>>>>> >>>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> -- >>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" >>>>>>> in >>>>>>> the body of a message to majordomo@xxxxxxxxxxxxxxx >>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>> >>>>>> >>>>> -- >>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" >>>>> in >>>>> the body of a message to majordomo@xxxxxxxxxxxxxxx >>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>> >>> -- >>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>> the body of a message to majordomo@xxxxxxxxxxxxxxx >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >> >> -- >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >> the body of a message to majordomo@xxxxxxxxxxxxxxx >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> > > > -- > Wido den Hollander > 42on B.V. > > Phone: +31 (0)20 700 9902 > Skype: contact42on -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html