ok I finally managed to get something on my test cluster, unfortunately, the dump goes to / any idea to change the destination path? My production / won't be big enough... -- Regards, Sébastien Han. On Mon, Feb 4, 2013 at 10:03 PM, Dan Mick <dan.mick@xxxxxxxxxxx> wrote: > ...and/or do you have the corepath set interestingly, or one of the > core-trapping mechanisms turned on? > > > On 02/04/2013 11:29 AM, Sage Weil wrote: >> >> On Mon, 4 Feb 2013, S?bastien Han wrote: >>> >>> Hum just tried several times on my test cluster and I can't get any >>> core dump. Does Ceph commit suicide or something? Is it expected >>> behavior? >> >> >> SIGSEGV should trigger the usual path that dumps a stack trace and then >> dumps core. Was your ulimit -c set before the daemon was started? >> >> sage >> >> >> >>> -- >>> Regards, >>> S?bastien Han. >>> >>> >>> On Sun, Feb 3, 2013 at 10:03 PM, S?bastien Han <han.sebastien@xxxxxxxxx> >>> wrote: >>>> >>>> Hi Lo?c, >>>> >>>> Thanks for bringing our discussion on the ML. I'll check that tomorrow >>>> :-). >>>> >>>> Cheer >>>> -- >>>> Regards, >>>> S?bastien Han. >>>> >>>> >>>> On Sun, Feb 3, 2013 at 10:01 PM, S?bastien Han <han.sebastien@xxxxxxxxx> >>>> wrote: >>>>> >>>>> Hi Lo?c, >>>>> >>>>> Thanks for bringing our discussion on the ML. I'll check that tomorrow >>>>> :-). >>>>> >>>>> Cheers >>>>> >>>>> -- >>>>> Regards, >>>>> S?bastien Han. >>>>> >>>>> >>>>> On Sun, Feb 3, 2013 at 7:17 PM, Loic Dachary <loic@xxxxxxxxxxx> wrote: >>>>>> >>>>>> >>>>>> Hi, >>>>>> >>>>>> As discussed during FOSDEM, the script you wrote to kill the OSD when >>>>>> it >>>>>> grows too much could be amended to core dump instead of just being >>>>>> killed & >>>>>> restarted. The binary + core could probably be used to figure out >>>>>> where the >>>>>> leak is. >>>>>> >>>>>> You should make sure the OSD current working directory is in a file >>>>>> system >>>>>> with enough free disk space to accomodate for the dump and set >>>>>> >>>>>> ulimit -c unlimited >>>>>> >>>>>> before running it ( your system default is probably ulimit -c 0 which >>>>>> inhibits core dumps ). When you detect that OSD grows too much kill it >>>>>> with >>>>>> >>>>>> kill -SEGV $pid >>>>>> >>>>>> and upload the core found in the working directory, together with the >>>>>> binary in a public place. If the osd binary is compiled with -g but >>>>>> without >>>>>> changing the -O settings, you should have a larger binary file but no >>>>>> negative impact on performances. Forensics analysis will be made a lot >>>>>> easier with the debugging symbols. >>>>>> >>>>>> My 2cts >>>>>> >>>>>> On 01/31/2013 08:57 PM, Sage Weil wrote: >>>>>>> >>>>>>> On Thu, 31 Jan 2013, Sylvain Munaut wrote: >>>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> I disabled scrubbing using >>>>>>>> >>>>>>>>> ceph osd tell \* injectargs '--osd-scrub-min-interval 1000000' >>>>>>>>> ceph osd tell \* injectargs '--osd-scrub-max-interval 10000000' >>>>>>>> >>>>>>>> >>>>>>>> and the leak seems to be gone. >>>>>>>> >>>>>>>> See the graph at http://i.imgur.com/A0KmVot.png with the OSD >>>>>>>> memory >>>>>>>> for the 12 osd processes over the last 3.5 days. >>>>>>>> Memory was rising every 24h. I did the change yesterday around 13h00 >>>>>>>> and OSDs stopped growing. OSD memory even seems to go down slowly by >>>>>>>> small blocks. >>>>>>>> >>>>>>>> Of course I assume disabling scrubbing is not a long term solution >>>>>>>> and >>>>>>>> I should re-enable it ... (how do I do that btw ? what were the >>>>>>>> default values for those parameters) >>>>>>> >>>>>>> >>>>>>> It depends on the exact commit you're on. You can see the defaults >>>>>>> if >>>>>>> you >>>>>>> do >>>>>>> >>>>>>> ceph-osd --show-config | grep osd_scrub >>>>>>> >>>>>>> Thanks for testing this... I have a few other ideas to try to >>>>>>> reproduce. >>>>>>> >>>>>>> sage >>>>>>> -- >>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" >>>>>>> in >>>>>>> the body of a message to majordomo@xxxxxxxxxxxxxxx >>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>> >>>>>> >>>>>> -- >>>>>> Lo?c Dachary, Artisan Logiciel Libre >>>>>> >>>>> >>> >>> >> -- >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >> the body of a message to majordomo@xxxxxxxxxxxxxxx >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html