Set your /proc/sys/kernel/core_pattern file. :) http://linux.die.net/man/5/core -Greg On Mon, Feb 4, 2013 at 1:08 PM, Sébastien Han <han.sebastien@xxxxxxxxx> wrote: > ok I finally managed to get something on my test cluster, > unfortunately, the dump goes to / > > any idea to change the destination path? > > My production / won't be big enough... > > -- > Regards, > Sébastien Han. > > > On Mon, Feb 4, 2013 at 10:03 PM, Dan Mick <dan.mick@xxxxxxxxxxx> wrote: >> ...and/or do you have the corepath set interestingly, or one of the >> core-trapping mechanisms turned on? >> >> >> On 02/04/2013 11:29 AM, Sage Weil wrote: >>> >>> On Mon, 4 Feb 2013, S?bastien Han wrote: >>>> >>>> Hum just tried several times on my test cluster and I can't get any >>>> core dump. Does Ceph commit suicide or something? Is it expected >>>> behavior? >>> >>> >>> SIGSEGV should trigger the usual path that dumps a stack trace and then >>> dumps core. Was your ulimit -c set before the daemon was started? >>> >>> sage >>> >>> >>> >>>> -- >>>> Regards, >>>> S?bastien Han. >>>> >>>> >>>> On Sun, Feb 3, 2013 at 10:03 PM, S?bastien Han <han.sebastien@xxxxxxxxx> >>>> wrote: >>>>> >>>>> Hi Lo?c, >>>>> >>>>> Thanks for bringing our discussion on the ML. I'll check that tomorrow >>>>> :-). >>>>> >>>>> Cheer >>>>> -- >>>>> Regards, >>>>> S?bastien Han. >>>>> >>>>> >>>>> On Sun, Feb 3, 2013 at 10:01 PM, S?bastien Han <han.sebastien@xxxxxxxxx> >>>>> wrote: >>>>>> >>>>>> Hi Lo?c, >>>>>> >>>>>> Thanks for bringing our discussion on the ML. I'll check that tomorrow >>>>>> :-). >>>>>> >>>>>> Cheers >>>>>> >>>>>> -- >>>>>> Regards, >>>>>> S?bastien Han. >>>>>> >>>>>> >>>>>> On Sun, Feb 3, 2013 at 7:17 PM, Loic Dachary <loic@xxxxxxxxxxx> wrote: >>>>>>> >>>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> As discussed during FOSDEM, the script you wrote to kill the OSD when >>>>>>> it >>>>>>> grows too much could be amended to core dump instead of just being >>>>>>> killed & >>>>>>> restarted. The binary + core could probably be used to figure out >>>>>>> where the >>>>>>> leak is. >>>>>>> >>>>>>> You should make sure the OSD current working directory is in a file >>>>>>> system >>>>>>> with enough free disk space to accomodate for the dump and set >>>>>>> >>>>>>> ulimit -c unlimited >>>>>>> >>>>>>> before running it ( your system default is probably ulimit -c 0 which >>>>>>> inhibits core dumps ). When you detect that OSD grows too much kill it >>>>>>> with >>>>>>> >>>>>>> kill -SEGV $pid >>>>>>> >>>>>>> and upload the core found in the working directory, together with the >>>>>>> binary in a public place. If the osd binary is compiled with -g but >>>>>>> without >>>>>>> changing the -O settings, you should have a larger binary file but no >>>>>>> negative impact on performances. Forensics analysis will be made a lot >>>>>>> easier with the debugging symbols. >>>>>>> >>>>>>> My 2cts >>>>>>> >>>>>>> On 01/31/2013 08:57 PM, Sage Weil wrote: >>>>>>>> >>>>>>>> On Thu, 31 Jan 2013, Sylvain Munaut wrote: >>>>>>>>> >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> I disabled scrubbing using >>>>>>>>> >>>>>>>>>> ceph osd tell \* injectargs '--osd-scrub-min-interval 1000000' >>>>>>>>>> ceph osd tell \* injectargs '--osd-scrub-max-interval 10000000' >>>>>>>>> >>>>>>>>> >>>>>>>>> and the leak seems to be gone. >>>>>>>>> >>>>>>>>> See the graph at http://i.imgur.com/A0KmVot.png with the OSD >>>>>>>>> memory >>>>>>>>> for the 12 osd processes over the last 3.5 days. >>>>>>>>> Memory was rising every 24h. I did the change yesterday around 13h00 >>>>>>>>> and OSDs stopped growing. OSD memory even seems to go down slowly by >>>>>>>>> small blocks. >>>>>>>>> >>>>>>>>> Of course I assume disabling scrubbing is not a long term solution >>>>>>>>> and >>>>>>>>> I should re-enable it ... (how do I do that btw ? what were the >>>>>>>>> default values for those parameters) >>>>>>>> >>>>>>>> >>>>>>>> It depends on the exact commit you're on. You can see the defaults >>>>>>>> if >>>>>>>> you >>>>>>>> do >>>>>>>> >>>>>>>> ceph-osd --show-config | grep osd_scrub >>>>>>>> >>>>>>>> Thanks for testing this... I have a few other ideas to try to >>>>>>>> reproduce. >>>>>>>> >>>>>>>> sage >>>>>>>> -- >>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" >>>>>>>> in >>>>>>>> the body of a message to majordomo@xxxxxxxxxxxxxxx >>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Lo?c Dachary, Artisan Logiciel Libre >>>>>>> >>>>>> >>>> >>>> >>> -- >>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>> the body of a message to majordomo@xxxxxxxxxxxxxxx >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>> >> > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html