...and/or do you have the corepath set interestingly, or one of the
core-trapping mechanisms turned on?
On 02/04/2013 11:29 AM, Sage Weil wrote:
On Mon, 4 Feb 2013, S?bastien Han wrote:
Hum just tried several times on my test cluster and I can't get any
core dump. Does Ceph commit suicide or something? Is it expected
behavior?
SIGSEGV should trigger the usual path that dumps a stack trace and then
dumps core. Was your ulimit -c set before the daemon was started?
sage
--
Regards,
S?bastien Han.
On Sun, Feb 3, 2013 at 10:03 PM, S?bastien Han <han.sebastien@xxxxxxxxx> wrote:
Hi Lo?c,
Thanks for bringing our discussion on the ML. I'll check that tomorrow :-).
Cheer
--
Regards,
S?bastien Han.
On Sun, Feb 3, 2013 at 10:01 PM, S?bastien Han <han.sebastien@xxxxxxxxx> wrote:
Hi Lo?c,
Thanks for bringing our discussion on the ML. I'll check that tomorrow :-).
Cheers
--
Regards,
S?bastien Han.
On Sun, Feb 3, 2013 at 7:17 PM, Loic Dachary <loic@xxxxxxxxxxx> wrote:
Hi,
As discussed during FOSDEM, the script you wrote to kill the OSD when it
grows too much could be amended to core dump instead of just being killed &
restarted. The binary + core could probably be used to figure out where the
leak is.
You should make sure the OSD current working directory is in a file system
with enough free disk space to accomodate for the dump and set
ulimit -c unlimited
before running it ( your system default is probably ulimit -c 0 which
inhibits core dumps ). When you detect that OSD grows too much kill it with
kill -SEGV $pid
and upload the core found in the working directory, together with the
binary in a public place. If the osd binary is compiled with -g but without
changing the -O settings, you should have a larger binary file but no
negative impact on performances. Forensics analysis will be made a lot
easier with the debugging symbols.
My 2cts
On 01/31/2013 08:57 PM, Sage Weil wrote:
On Thu, 31 Jan 2013, Sylvain Munaut wrote:
Hi,
I disabled scrubbing using
ceph osd tell \* injectargs '--osd-scrub-min-interval 1000000'
ceph osd tell \* injectargs '--osd-scrub-max-interval 10000000'
and the leak seems to be gone.
See the graph at http://i.imgur.com/A0KmVot.png with the OSD memory
for the 12 osd processes over the last 3.5 days.
Memory was rising every 24h. I did the change yesterday around 13h00
and OSDs stopped growing. OSD memory even seems to go down slowly by
small blocks.
Of course I assume disabling scrubbing is not a long term solution and
I should re-enable it ... (how do I do that btw ? what were the
default values for those parameters)
It depends on the exact commit you're on. You can see the defaults if
you
do
ceph-osd --show-config | grep osd_scrub
Thanks for testing this... I have a few other ideas to try to reproduce.
sage
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Lo?c Dachary, Artisan Logiciel Libre
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html