Can anyone who hit this bug please confirm that your system contains libc 2.15+? On Tue, Feb 5, 2013 at 1:27 AM, Sébastien Han <han.sebastien@xxxxxxxxx> wrote: > oh nice, the pattern also matches path :D, didn't know that > thanks Greg > -- > Regards, > Sébastien Han. > > > On Mon, Feb 4, 2013 at 10:22 PM, Gregory Farnum <greg@xxxxxxxxxxx> wrote: >> Set your /proc/sys/kernel/core_pattern file. :) http://linux.die.net/man/5/core >> -Greg >> >> On Mon, Feb 4, 2013 at 1:08 PM, Sébastien Han <han.sebastien@xxxxxxxxx> wrote: >>> ok I finally managed to get something on my test cluster, >>> unfortunately, the dump goes to / >>> >>> any idea to change the destination path? >>> >>> My production / won't be big enough... >>> >>> -- >>> Regards, >>> Sébastien Han. >>> >>> >>> On Mon, Feb 4, 2013 at 10:03 PM, Dan Mick <dan.mick@xxxxxxxxxxx> wrote: >>>> ...and/or do you have the corepath set interestingly, or one of the >>>> core-trapping mechanisms turned on? >>>> >>>> >>>> On 02/04/2013 11:29 AM, Sage Weil wrote: >>>>> >>>>> On Mon, 4 Feb 2013, S?bastien Han wrote: >>>>>> >>>>>> Hum just tried several times on my test cluster and I can't get any >>>>>> core dump. Does Ceph commit suicide or something? Is it expected >>>>>> behavior? >>>>> >>>>> >>>>> SIGSEGV should trigger the usual path that dumps a stack trace and then >>>>> dumps core. Was your ulimit -c set before the daemon was started? >>>>> >>>>> sage >>>>> >>>>> >>>>> >>>>>> -- >>>>>> Regards, >>>>>> S?bastien Han. >>>>>> >>>>>> >>>>>> On Sun, Feb 3, 2013 at 10:03 PM, S?bastien Han <han.sebastien@xxxxxxxxx> >>>>>> wrote: >>>>>>> >>>>>>> Hi Lo?c, >>>>>>> >>>>>>> Thanks for bringing our discussion on the ML. I'll check that tomorrow >>>>>>> :-). >>>>>>> >>>>>>> Cheer >>>>>>> -- >>>>>>> Regards, >>>>>>> S?bastien Han. >>>>>>> >>>>>>> >>>>>>> On Sun, Feb 3, 2013 at 10:01 PM, S?bastien Han <han.sebastien@xxxxxxxxx> >>>>>>> wrote: >>>>>>>> >>>>>>>> Hi Lo?c, >>>>>>>> >>>>>>>> Thanks for bringing our discussion on the ML. I'll check that tomorrow >>>>>>>> :-). >>>>>>>> >>>>>>>> Cheers >>>>>>>> >>>>>>>> -- >>>>>>>> Regards, >>>>>>>> S?bastien Han. >>>>>>>> >>>>>>>> >>>>>>>> On Sun, Feb 3, 2013 at 7:17 PM, Loic Dachary <loic@xxxxxxxxxxx> wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> As discussed during FOSDEM, the script you wrote to kill the OSD when >>>>>>>>> it >>>>>>>>> grows too much could be amended to core dump instead of just being >>>>>>>>> killed & >>>>>>>>> restarted. The binary + core could probably be used to figure out >>>>>>>>> where the >>>>>>>>> leak is. >>>>>>>>> >>>>>>>>> You should make sure the OSD current working directory is in a file >>>>>>>>> system >>>>>>>>> with enough free disk space to accomodate for the dump and set >>>>>>>>> >>>>>>>>> ulimit -c unlimited >>>>>>>>> >>>>>>>>> before running it ( your system default is probably ulimit -c 0 which >>>>>>>>> inhibits core dumps ). When you detect that OSD grows too much kill it >>>>>>>>> with >>>>>>>>> >>>>>>>>> kill -SEGV $pid >>>>>>>>> >>>>>>>>> and upload the core found in the working directory, together with the >>>>>>>>> binary in a public place. If the osd binary is compiled with -g but >>>>>>>>> without >>>>>>>>> changing the -O settings, you should have a larger binary file but no >>>>>>>>> negative impact on performances. Forensics analysis will be made a lot >>>>>>>>> easier with the debugging symbols. >>>>>>>>> >>>>>>>>> My 2cts >>>>>>>>> >>>>>>>>> On 01/31/2013 08:57 PM, Sage Weil wrote: >>>>>>>>>> >>>>>>>>>> On Thu, 31 Jan 2013, Sylvain Munaut wrote: >>>>>>>>>>> >>>>>>>>>>> Hi, >>>>>>>>>>> >>>>>>>>>>> I disabled scrubbing using >>>>>>>>>>> >>>>>>>>>>>> ceph osd tell \* injectargs '--osd-scrub-min-interval 1000000' >>>>>>>>>>>> ceph osd tell \* injectargs '--osd-scrub-max-interval 10000000' >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> and the leak seems to be gone. >>>>>>>>>>> >>>>>>>>>>> See the graph at http://i.imgur.com/A0KmVot.png with the OSD >>>>>>>>>>> memory >>>>>>>>>>> for the 12 osd processes over the last 3.5 days. >>>>>>>>>>> Memory was rising every 24h. I did the change yesterday around 13h00 >>>>>>>>>>> and OSDs stopped growing. OSD memory even seems to go down slowly by >>>>>>>>>>> small blocks. >>>>>>>>>>> >>>>>>>>>>> Of course I assume disabling scrubbing is not a long term solution >>>>>>>>>>> and >>>>>>>>>>> I should re-enable it ... (how do I do that btw ? what were the >>>>>>>>>>> default values for those parameters) >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> It depends on the exact commit you're on. You can see the defaults >>>>>>>>>> if >>>>>>>>>> you >>>>>>>>>> do >>>>>>>>>> >>>>>>>>>> ceph-osd --show-config | grep osd_scrub >>>>>>>>>> >>>>>>>>>> Thanks for testing this... I have a few other ideas to try to >>>>>>>>>> reproduce. >>>>>>>>>> >>>>>>>>>> sage >>>>>>>>>> -- >>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" >>>>>>>>>> in >>>>>>>>>> the body of a message to majordomo@xxxxxxxxxxxxxxx >>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Lo?c Dachary, Artisan Logiciel Libre >>>>>>>>> >>>>>>>> >>>>>> >>>>>> >>>>> -- >>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>> the body of a message to majordomo@xxxxxxxxxxxxxxx >>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>> >>>> >>> -- >>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>> the body of a message to majordomo@xxxxxxxxxxxxxxx >>> More majordomo info at http://vger.kernel.org/majordomo-info.html > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html