On Mon, 4 Feb 2013, S?bastien Han wrote: > Hum just tried several times on my test cluster and I can't get any > core dump. Does Ceph commit suicide or something? Is it expected > behavior? SIGSEGV should trigger the usual path that dumps a stack trace and then dumps core. Was your ulimit -c set before the daemon was started? sage > -- > Regards, > S?bastien Han. > > > On Sun, Feb 3, 2013 at 10:03 PM, S?bastien Han <han.sebastien@xxxxxxxxx> wrote: > > Hi Lo?c, > > > > Thanks for bringing our discussion on the ML. I'll check that tomorrow :-). > > > > Cheer > > -- > > Regards, > > S?bastien Han. > > > > > > On Sun, Feb 3, 2013 at 10:01 PM, S?bastien Han <han.sebastien@xxxxxxxxx> wrote: > >> Hi Lo?c, > >> > >> Thanks for bringing our discussion on the ML. I'll check that tomorrow :-). > >> > >> Cheers > >> > >> -- > >> Regards, > >> S?bastien Han. > >> > >> > >> On Sun, Feb 3, 2013 at 7:17 PM, Loic Dachary <loic@xxxxxxxxxxx> wrote: > >>> > >>> Hi, > >>> > >>> As discussed during FOSDEM, the script you wrote to kill the OSD when it > >>> grows too much could be amended to core dump instead of just being killed & > >>> restarted. The binary + core could probably be used to figure out where the > >>> leak is. > >>> > >>> You should make sure the OSD current working directory is in a file system > >>> with enough free disk space to accomodate for the dump and set > >>> > >>> ulimit -c unlimited > >>> > >>> before running it ( your system default is probably ulimit -c 0 which > >>> inhibits core dumps ). When you detect that OSD grows too much kill it with > >>> > >>> kill -SEGV $pid > >>> > >>> and upload the core found in the working directory, together with the > >>> binary in a public place. If the osd binary is compiled with -g but without > >>> changing the -O settings, you should have a larger binary file but no > >>> negative impact on performances. Forensics analysis will be made a lot > >>> easier with the debugging symbols. > >>> > >>> My 2cts > >>> > >>> On 01/31/2013 08:57 PM, Sage Weil wrote: > >>> > On Thu, 31 Jan 2013, Sylvain Munaut wrote: > >>> >> Hi, > >>> >> > >>> >> I disabled scrubbing using > >>> >> > >>> >>> ceph osd tell \* injectargs '--osd-scrub-min-interval 1000000' > >>> >>> ceph osd tell \* injectargs '--osd-scrub-max-interval 10000000' > >>> >> > >>> >> and the leak seems to be gone. > >>> >> > >>> >> See the graph at http://i.imgur.com/A0KmVot.png with the OSD memory > >>> >> for the 12 osd processes over the last 3.5 days. > >>> >> Memory was rising every 24h. I did the change yesterday around 13h00 > >>> >> and OSDs stopped growing. OSD memory even seems to go down slowly by > >>> >> small blocks. > >>> >> > >>> >> Of course I assume disabling scrubbing is not a long term solution and > >>> >> I should re-enable it ... (how do I do that btw ? what were the > >>> >> default values for those parameters) > >>> > > >>> > It depends on the exact commit you're on. You can see the defaults if > >>> > you > >>> > do > >>> > > >>> > ceph-osd --show-config | grep osd_scrub > >>> > > >>> > Thanks for testing this... I have a few other ideas to try to reproduce. > >>> > > >>> > sage > >>> > -- > >>> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > >>> > the body of a message to majordomo@xxxxxxxxxxxxxxx > >>> > More majordomo info at http://vger.kernel.org/majordomo-info.html > >>> > >>> -- > >>> Lo?c Dachary, Artisan Logiciel Libre > >>> > >> > > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html