Hum just tried several times on my test cluster and I can't get any core dump. Does Ceph commit suicide or something? Is it expected behavior? -- Regards, Sébastien Han. On Sun, Feb 3, 2013 at 10:03 PM, Sébastien Han <han.sebastien@xxxxxxxxx> wrote: > Hi Loïc, > > Thanks for bringing our discussion on the ML. I'll check that tomorrow :-). > > Cheer > -- > Regards, > Sébastien Han. > > > On Sun, Feb 3, 2013 at 10:01 PM, Sébastien Han <han.sebastien@xxxxxxxxx> wrote: >> Hi Loïc, >> >> Thanks for bringing our discussion on the ML. I'll check that tomorrow :-). >> >> Cheers >> >> -- >> Regards, >> Sébastien Han. >> >> >> On Sun, Feb 3, 2013 at 7:17 PM, Loic Dachary <loic@xxxxxxxxxxx> wrote: >>> >>> Hi, >>> >>> As discussed during FOSDEM, the script you wrote to kill the OSD when it >>> grows too much could be amended to core dump instead of just being killed & >>> restarted. The binary + core could probably be used to figure out where the >>> leak is. >>> >>> You should make sure the OSD current working directory is in a file system >>> with enough free disk space to accomodate for the dump and set >>> >>> ulimit -c unlimited >>> >>> before running it ( your system default is probably ulimit -c 0 which >>> inhibits core dumps ). When you detect that OSD grows too much kill it with >>> >>> kill -SEGV $pid >>> >>> and upload the core found in the working directory, together with the >>> binary in a public place. If the osd binary is compiled with -g but without >>> changing the -O settings, you should have a larger binary file but no >>> negative impact on performances. Forensics analysis will be made a lot >>> easier with the debugging symbols. >>> >>> My 2cts >>> >>> On 01/31/2013 08:57 PM, Sage Weil wrote: >>> > On Thu, 31 Jan 2013, Sylvain Munaut wrote: >>> >> Hi, >>> >> >>> >> I disabled scrubbing using >>> >> >>> >>> ceph osd tell \* injectargs '--osd-scrub-min-interval 1000000' >>> >>> ceph osd tell \* injectargs '--osd-scrub-max-interval 10000000' >>> >> >>> >> and the leak seems to be gone. >>> >> >>> >> See the graph at http://i.imgur.com/A0KmVot.png with the OSD memory >>> >> for the 12 osd processes over the last 3.5 days. >>> >> Memory was rising every 24h. I did the change yesterday around 13h00 >>> >> and OSDs stopped growing. OSD memory even seems to go down slowly by >>> >> small blocks. >>> >> >>> >> Of course I assume disabling scrubbing is not a long term solution and >>> >> I should re-enable it ... (how do I do that btw ? what were the >>> >> default values for those parameters) >>> > >>> > It depends on the exact commit you're on. You can see the defaults if >>> > you >>> > do >>> > >>> > ceph-osd --show-config | grep osd_scrub >>> > >>> > Thanks for testing this... I have a few other ideas to try to reproduce. >>> > >>> > sage >>> > -- >>> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>> > the body of a message to majordomo@xxxxxxxxxxxxxxx >>> > More majordomo info at http://vger.kernel.org/majordomo-info.html >>> >>> -- >>> Loïc Dachary, Artisan Logiciel Libre >>> >> -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html