Re: [0.48.3] OSD memory leak when scrubbing

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 02/16/2013 08:09 AM, Andrey Korolyov wrote:
Can anyone who hit this bug please confirm that your system contains libc 2.15+?


I've seen this with 0.56.2 as well on Ubuntu 12.04. Ubuntu 12.04 comes with 2.15-0ubuntu10.3

Haven't gotten around to adding a heap profiler to it.

Wido

On Tue, Feb 5, 2013 at 1:27 AM, Sébastien Han <han.sebastien@xxxxxxxxx> wrote:
oh nice, the pattern also matches path :D, didn't know that
thanks Greg
--
Regards,
Sébastien Han.


On Mon, Feb 4, 2013 at 10:22 PM, Gregory Farnum <greg@xxxxxxxxxxx> wrote:
Set your /proc/sys/kernel/core_pattern file. :) http://linux.die.net/man/5/core
-Greg

On Mon, Feb 4, 2013 at 1:08 PM, Sébastien Han <han.sebastien@xxxxxxxxx> wrote:
ok I finally managed to get something on my test cluster,
unfortunately, the dump goes to /

any idea to change the destination path?

My production / won't be big enough...

--
Regards,
Sébastien Han.


On Mon, Feb 4, 2013 at 10:03 PM, Dan Mick <dan.mick@xxxxxxxxxxx> wrote:
...and/or do you have the corepath set interestingly, or one of the
core-trapping mechanisms turned on?


On 02/04/2013 11:29 AM, Sage Weil wrote:

On Mon, 4 Feb 2013, S?bastien Han wrote:

Hum just tried several times on my test cluster and I can't get any
core dump. Does Ceph commit suicide or something? Is it expected
behavior?


SIGSEGV should trigger the usual path that dumps a stack trace and then
dumps core.  Was your ulimit -c set before the daemon was started?

sage



--
Regards,
S?bastien Han.


On Sun, Feb 3, 2013 at 10:03 PM, S?bastien Han <han.sebastien@xxxxxxxxx>
wrote:

Hi Lo?c,

Thanks for bringing our discussion on the ML. I'll check that tomorrow
:-).

Cheer
--
Regards,
S?bastien Han.


On Sun, Feb 3, 2013 at 10:01 PM, S?bastien Han <han.sebastien@xxxxxxxxx>
wrote:

Hi Lo?c,

Thanks for bringing our discussion on the ML. I'll check that tomorrow
:-).

Cheers

--
Regards,
S?bastien Han.


On Sun, Feb 3, 2013 at 7:17 PM, Loic Dachary <loic@xxxxxxxxxxx> wrote:


Hi,

As discussed during FOSDEM, the script you wrote to kill the OSD when
it
grows too much could be amended to core dump instead of just being
killed &
restarted. The binary + core could probably be used to figure out
where the
leak is.

You should make sure the OSD current working directory is in a file
system
with enough free disk space to accomodate for the dump and set

ulimit -c unlimited

before running it ( your system default is probably ulimit -c 0 which
inhibits core dumps ). When you detect that OSD grows too much kill it
with

kill -SEGV $pid

and upload the core found in the working directory, together with the
binary in a public place. If the osd binary is compiled with -g but
without
changing the -O settings, you should have a larger binary file but no
negative impact on performances. Forensics analysis will be made a lot
easier with the debugging symbols.

My 2cts

On 01/31/2013 08:57 PM, Sage Weil wrote:

On Thu, 31 Jan 2013, Sylvain Munaut wrote:

Hi,

I disabled scrubbing using

ceph osd tell \* injectargs '--osd-scrub-min-interval 1000000'
ceph osd tell \* injectargs '--osd-scrub-max-interval 10000000'


and the leak seems to be gone.

See the graph at  http://i.imgur.com/A0KmVot.png  with the OSD
memory
for the 12 osd processes over the last 3.5 days.
Memory was rising every 24h. I did the change yesterday around 13h00
and OSDs stopped growing. OSD memory even seems to go down slowly by
small blocks.

Of course I assume disabling scrubbing is not a long term solution
and
I should re-enable it ... (how do I do that btw ? what were the
default values for those parameters)


It depends on the exact commit you're on.  You can see the defaults
if
you
do

   ceph-osd --show-config | grep osd_scrub

Thanks for testing this... I have a few other ideas to try to
reproduce.

sage
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel"
in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
Lo?c Dachary, Artisan Logiciel Libre




--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



--
Wido den Hollander
42on B.V.

Phone: +31 (0)20 700 9902
Skype: contact42on
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux