oh nice, the pattern also matches path :D, didn't know that thanks Greg -- Regards, Sébastien Han. On Mon, Feb 4, 2013 at 10:22 PM, Gregory Farnum <greg@xxxxxxxxxxx> wrote: > Set your /proc/sys/kernel/core_pattern file. :) http://linux.die.net/man/5/core > -Greg > > On Mon, Feb 4, 2013 at 1:08 PM, Sébastien Han <han.sebastien@xxxxxxxxx> wrote: >> ok I finally managed to get something on my test cluster, >> unfortunately, the dump goes to / >> >> any idea to change the destination path? >> >> My production / won't be big enough... >> >> -- >> Regards, >> Sébastien Han. >> >> >> On Mon, Feb 4, 2013 at 10:03 PM, Dan Mick <dan.mick@xxxxxxxxxxx> wrote: >>> ...and/or do you have the corepath set interestingly, or one of the >>> core-trapping mechanisms turned on? >>> >>> >>> On 02/04/2013 11:29 AM, Sage Weil wrote: >>>> >>>> On Mon, 4 Feb 2013, S?bastien Han wrote: >>>>> >>>>> Hum just tried several times on my test cluster and I can't get any >>>>> core dump. Does Ceph commit suicide or something? Is it expected >>>>> behavior? >>>> >>>> >>>> SIGSEGV should trigger the usual path that dumps a stack trace and then >>>> dumps core. Was your ulimit -c set before the daemon was started? >>>> >>>> sage >>>> >>>> >>>> >>>>> -- >>>>> Regards, >>>>> S?bastien Han. >>>>> >>>>> >>>>> On Sun, Feb 3, 2013 at 10:03 PM, S?bastien Han <han.sebastien@xxxxxxxxx> >>>>> wrote: >>>>>> >>>>>> Hi Lo?c, >>>>>> >>>>>> Thanks for bringing our discussion on the ML. I'll check that tomorrow >>>>>> :-). >>>>>> >>>>>> Cheer >>>>>> -- >>>>>> Regards, >>>>>> S?bastien Han. >>>>>> >>>>>> >>>>>> On Sun, Feb 3, 2013 at 10:01 PM, S?bastien Han <han.sebastien@xxxxxxxxx> >>>>>> wrote: >>>>>>> >>>>>>> Hi Lo?c, >>>>>>> >>>>>>> Thanks for bringing our discussion on the ML. I'll check that tomorrow >>>>>>> :-). >>>>>>> >>>>>>> Cheers >>>>>>> >>>>>>> -- >>>>>>> Regards, >>>>>>> S?bastien Han. >>>>>>> >>>>>>> >>>>>>> On Sun, Feb 3, 2013 at 7:17 PM, Loic Dachary <loic@xxxxxxxxxxx> wrote: >>>>>>>> >>>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> As discussed during FOSDEM, the script you wrote to kill the OSD when >>>>>>>> it >>>>>>>> grows too much could be amended to core dump instead of just being >>>>>>>> killed & >>>>>>>> restarted. The binary + core could probably be used to figure out >>>>>>>> where the >>>>>>>> leak is. >>>>>>>> >>>>>>>> You should make sure the OSD current working directory is in a file >>>>>>>> system >>>>>>>> with enough free disk space to accomodate for the dump and set >>>>>>>> >>>>>>>> ulimit -c unlimited >>>>>>>> >>>>>>>> before running it ( your system default is probably ulimit -c 0 which >>>>>>>> inhibits core dumps ). When you detect that OSD grows too much kill it >>>>>>>> with >>>>>>>> >>>>>>>> kill -SEGV $pid >>>>>>>> >>>>>>>> and upload the core found in the working directory, together with the >>>>>>>> binary in a public place. If the osd binary is compiled with -g but >>>>>>>> without >>>>>>>> changing the -O settings, you should have a larger binary file but no >>>>>>>> negative impact on performances. Forensics analysis will be made a lot >>>>>>>> easier with the debugging symbols. >>>>>>>> >>>>>>>> My 2cts >>>>>>>> >>>>>>>> On 01/31/2013 08:57 PM, Sage Weil wrote: >>>>>>>>> >>>>>>>>> On Thu, 31 Jan 2013, Sylvain Munaut wrote: >>>>>>>>>> >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> I disabled scrubbing using >>>>>>>>>> >>>>>>>>>>> ceph osd tell \* injectargs '--osd-scrub-min-interval 1000000' >>>>>>>>>>> ceph osd tell \* injectargs '--osd-scrub-max-interval 10000000' >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> and the leak seems to be gone. >>>>>>>>>> >>>>>>>>>> See the graph at http://i.imgur.com/A0KmVot.png with the OSD >>>>>>>>>> memory >>>>>>>>>> for the 12 osd processes over the last 3.5 days. >>>>>>>>>> Memory was rising every 24h. I did the change yesterday around 13h00 >>>>>>>>>> and OSDs stopped growing. OSD memory even seems to go down slowly by >>>>>>>>>> small blocks. >>>>>>>>>> >>>>>>>>>> Of course I assume disabling scrubbing is not a long term solution >>>>>>>>>> and >>>>>>>>>> I should re-enable it ... (how do I do that btw ? what were the >>>>>>>>>> default values for those parameters) >>>>>>>>> >>>>>>>>> >>>>>>>>> It depends on the exact commit you're on. You can see the defaults >>>>>>>>> if >>>>>>>>> you >>>>>>>>> do >>>>>>>>> >>>>>>>>> ceph-osd --show-config | grep osd_scrub >>>>>>>>> >>>>>>>>> Thanks for testing this... I have a few other ideas to try to >>>>>>>>> reproduce. >>>>>>>>> >>>>>>>>> sage >>>>>>>>> -- >>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" >>>>>>>>> in >>>>>>>>> the body of a message to majordomo@xxxxxxxxxxxxxxx >>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Lo?c Dachary, Artisan Logiciel Libre >>>>>>>> >>>>>>> >>>>> >>>>> >>>> -- >>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>> the body of a message to majordomo@xxxxxxxxxxxxxxx >>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>> >>> >> -- >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >> the body of a message to majordomo@xxxxxxxxxxxxxxx >> More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html