Re: [0.48.3] OSD memory leak when scrubbing

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



+1
--
Regards,
Sébastien Han.


On Sat, Feb 16, 2013 at 10:09 AM, Wido den Hollander <wido@xxxxxxxx> wrote:
> On 02/16/2013 08:09 AM, Andrey Korolyov wrote:
>>
>> Can anyone who hit this bug please confirm that your system contains libc
>> 2.15+?
>>
>
> I've seen this with 0.56.2 as well on Ubuntu 12.04. Ubuntu 12.04 comes with
> 2.15-0ubuntu10.3
>
> Haven't gotten around to adding a heap profiler to it.
>
> Wido
>
>
>> On Tue, Feb 5, 2013 at 1:27 AM, Sébastien Han <han.sebastien@xxxxxxxxx>
>> wrote:
>>>
>>> oh nice, the pattern also matches path :D, didn't know that
>>> thanks Greg
>>> --
>>> Regards,
>>> Sébastien Han.
>>>
>>>
>>> On Mon, Feb 4, 2013 at 10:22 PM, Gregory Farnum <greg@xxxxxxxxxxx> wrote:
>>>>
>>>> Set your /proc/sys/kernel/core_pattern file. :)
>>>> http://linux.die.net/man/5/core
>>>> -Greg
>>>>
>>>> On Mon, Feb 4, 2013 at 1:08 PM, Sébastien Han <han.sebastien@xxxxxxxxx>
>>>> wrote:
>>>>>
>>>>> ok I finally managed to get something on my test cluster,
>>>>> unfortunately, the dump goes to /
>>>>>
>>>>> any idea to change the destination path?
>>>>>
>>>>> My production / won't be big enough...
>>>>>
>>>>> --
>>>>> Regards,
>>>>> Sébastien Han.
>>>>>
>>>>>
>>>>> On Mon, Feb 4, 2013 at 10:03 PM, Dan Mick <dan.mick@xxxxxxxxxxx> wrote:
>>>>>>
>>>>>> ...and/or do you have the corepath set interestingly, or one of the
>>>>>> core-trapping mechanisms turned on?
>>>>>>
>>>>>>
>>>>>> On 02/04/2013 11:29 AM, Sage Weil wrote:
>>>>>>>
>>>>>>>
>>>>>>> On Mon, 4 Feb 2013, S?bastien Han wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> Hum just tried several times on my test cluster and I can't get any
>>>>>>>> core dump. Does Ceph commit suicide or something? Is it expected
>>>>>>>> behavior?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> SIGSEGV should trigger the usual path that dumps a stack trace and
>>>>>>> then
>>>>>>> dumps core.  Was your ulimit -c set before the daemon was started?
>>>>>>>
>>>>>>> sage
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> --
>>>>>>>> Regards,
>>>>>>>> S?bastien Han.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Sun, Feb 3, 2013 at 10:03 PM, S?bastien Han
>>>>>>>> <han.sebastien@xxxxxxxxx>
>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Hi Lo?c,
>>>>>>>>>
>>>>>>>>> Thanks for bringing our discussion on the ML. I'll check that
>>>>>>>>> tomorrow
>>>>>>>>> :-).
>>>>>>>>>
>>>>>>>>> Cheer
>>>>>>>>> --
>>>>>>>>> Regards,
>>>>>>>>> S?bastien Han.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Sun, Feb 3, 2013 at 10:01 PM, S?bastien Han
>>>>>>>>> <han.sebastien@xxxxxxxxx>
>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Hi Lo?c,
>>>>>>>>>>
>>>>>>>>>> Thanks for bringing our discussion on the ML. I'll check that
>>>>>>>>>> tomorrow
>>>>>>>>>> :-).
>>>>>>>>>>
>>>>>>>>>> Cheers
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Regards,
>>>>>>>>>> S?bastien Han.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Sun, Feb 3, 2013 at 7:17 PM, Loic Dachary <loic@xxxxxxxxxxx>
>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> As discussed during FOSDEM, the script you wrote to kill the OSD
>>>>>>>>>>> when
>>>>>>>>>>> it
>>>>>>>>>>> grows too much could be amended to core dump instead of just
>>>>>>>>>>> being
>>>>>>>>>>> killed &
>>>>>>>>>>> restarted. The binary + core could probably be used to figure out
>>>>>>>>>>> where the
>>>>>>>>>>> leak is.
>>>>>>>>>>>
>>>>>>>>>>> You should make sure the OSD current working directory is in a
>>>>>>>>>>> file
>>>>>>>>>>> system
>>>>>>>>>>> with enough free disk space to accomodate for the dump and set
>>>>>>>>>>>
>>>>>>>>>>> ulimit -c unlimited
>>>>>>>>>>>
>>>>>>>>>>> before running it ( your system default is probably ulimit -c 0
>>>>>>>>>>> which
>>>>>>>>>>> inhibits core dumps ). When you detect that OSD grows too much
>>>>>>>>>>> kill it
>>>>>>>>>>> with
>>>>>>>>>>>
>>>>>>>>>>> kill -SEGV $pid
>>>>>>>>>>>
>>>>>>>>>>> and upload the core found in the working directory, together with
>>>>>>>>>>> the
>>>>>>>>>>> binary in a public place. If the osd binary is compiled with -g
>>>>>>>>>>> but
>>>>>>>>>>> without
>>>>>>>>>>> changing the -O settings, you should have a larger binary file
>>>>>>>>>>> but no
>>>>>>>>>>> negative impact on performances. Forensics analysis will be made
>>>>>>>>>>> a lot
>>>>>>>>>>> easier with the debugging symbols.
>>>>>>>>>>>
>>>>>>>>>>> My 2cts
>>>>>>>>>>>
>>>>>>>>>>> On 01/31/2013 08:57 PM, Sage Weil wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Thu, 31 Jan 2013, Sylvain Munaut wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>
>>>>>>>>>>>>> I disabled scrubbing using
>>>>>>>>>>>>>
>>>>>>>>>>>>>> ceph osd tell \* injectargs '--osd-scrub-min-interval 1000000'
>>>>>>>>>>>>>> ceph osd tell \* injectargs '--osd-scrub-max-interval
>>>>>>>>>>>>>> 10000000'
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> and the leak seems to be gone.
>>>>>>>>>>>>>
>>>>>>>>>>>>> See the graph at  http://i.imgur.com/A0KmVot.png  with the OSD
>>>>>>>>>>>>> memory
>>>>>>>>>>>>> for the 12 osd processes over the last 3.5 days.
>>>>>>>>>>>>> Memory was rising every 24h. I did the change yesterday around
>>>>>>>>>>>>> 13h00
>>>>>>>>>>>>> and OSDs stopped growing. OSD memory even seems to go down
>>>>>>>>>>>>> slowly by
>>>>>>>>>>>>> small blocks.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Of course I assume disabling scrubbing is not a long term
>>>>>>>>>>>>> solution
>>>>>>>>>>>>> and
>>>>>>>>>>>>> I should re-enable it ... (how do I do that btw ? what were the
>>>>>>>>>>>>> default values for those parameters)
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> It depends on the exact commit you're on.  You can see the
>>>>>>>>>>>> defaults
>>>>>>>>>>>> if
>>>>>>>>>>>> you
>>>>>>>>>>>> do
>>>>>>>>>>>>
>>>>>>>>>>>>    ceph-osd --show-config | grep osd_scrub
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks for testing this... I have a few other ideas to try to
>>>>>>>>>>>> reproduce.
>>>>>>>>>>>>
>>>>>>>>>>>> sage
>>>>>>>>>>>> --
>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe
>>>>>>>>>>>> ceph-devel"
>>>>>>>>>>>> in
>>>>>>>>>>>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>>>>>>>>>>>> More majordomo info at
>>>>>>>>>>>> http://vger.kernel.org/majordomo-info.html
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Lo?c Dachary, Artisan Logiciel Libre
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>> --
>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
>>>>>>> in
>>>>>>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>
>>>>>>
>>>>> --
>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
>>>>> in
>>>>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>
>
> --
> Wido den Hollander
> 42on B.V.
>
> Phone: +31 (0)20 700 9902
> Skype: contact42on
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux