Re: [0.48.3] OSD memory leak when scrubbing

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



oh nice, the pattern also matches path :D, didn't know that
thanks Greg
--
Regards,
Sébastien Han.


On Mon, Feb 4, 2013 at 10:22 PM, Gregory Farnum <greg@xxxxxxxxxxx> wrote:
> Set your /proc/sys/kernel/core_pattern file. :) http://linux.die.net/man/5/core
> -Greg
>
> On Mon, Feb 4, 2013 at 1:08 PM, Sébastien Han <han.sebastien@xxxxxxxxx> wrote:
>> ok I finally managed to get something on my test cluster,
>> unfortunately, the dump goes to /
>>
>> any idea to change the destination path?
>>
>> My production / won't be big enough...
>>
>> --
>> Regards,
>> Sébastien Han.
>>
>>
>> On Mon, Feb 4, 2013 at 10:03 PM, Dan Mick <dan.mick@xxxxxxxxxxx> wrote:
>>> ...and/or do you have the corepath set interestingly, or one of the
>>> core-trapping mechanisms turned on?
>>>
>>>
>>> On 02/04/2013 11:29 AM, Sage Weil wrote:
>>>>
>>>> On Mon, 4 Feb 2013, S?bastien Han wrote:
>>>>>
>>>>> Hum just tried several times on my test cluster and I can't get any
>>>>> core dump. Does Ceph commit suicide or something? Is it expected
>>>>> behavior?
>>>>
>>>>
>>>> SIGSEGV should trigger the usual path that dumps a stack trace and then
>>>> dumps core.  Was your ulimit -c set before the daemon was started?
>>>>
>>>> sage
>>>>
>>>>
>>>>
>>>>> --
>>>>> Regards,
>>>>> S?bastien Han.
>>>>>
>>>>>
>>>>> On Sun, Feb 3, 2013 at 10:03 PM, S?bastien Han <han.sebastien@xxxxxxxxx>
>>>>> wrote:
>>>>>>
>>>>>> Hi Lo?c,
>>>>>>
>>>>>> Thanks for bringing our discussion on the ML. I'll check that tomorrow
>>>>>> :-).
>>>>>>
>>>>>> Cheer
>>>>>> --
>>>>>> Regards,
>>>>>> S?bastien Han.
>>>>>>
>>>>>>
>>>>>> On Sun, Feb 3, 2013 at 10:01 PM, S?bastien Han <han.sebastien@xxxxxxxxx>
>>>>>> wrote:
>>>>>>>
>>>>>>> Hi Lo?c,
>>>>>>>
>>>>>>> Thanks for bringing our discussion on the ML. I'll check that tomorrow
>>>>>>> :-).
>>>>>>>
>>>>>>> Cheers
>>>>>>>
>>>>>>> --
>>>>>>> Regards,
>>>>>>> S?bastien Han.
>>>>>>>
>>>>>>>
>>>>>>> On Sun, Feb 3, 2013 at 7:17 PM, Loic Dachary <loic@xxxxxxxxxxx> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> As discussed during FOSDEM, the script you wrote to kill the OSD when
>>>>>>>> it
>>>>>>>> grows too much could be amended to core dump instead of just being
>>>>>>>> killed &
>>>>>>>> restarted. The binary + core could probably be used to figure out
>>>>>>>> where the
>>>>>>>> leak is.
>>>>>>>>
>>>>>>>> You should make sure the OSD current working directory is in a file
>>>>>>>> system
>>>>>>>> with enough free disk space to accomodate for the dump and set
>>>>>>>>
>>>>>>>> ulimit -c unlimited
>>>>>>>>
>>>>>>>> before running it ( your system default is probably ulimit -c 0 which
>>>>>>>> inhibits core dumps ). When you detect that OSD grows too much kill it
>>>>>>>> with
>>>>>>>>
>>>>>>>> kill -SEGV $pid
>>>>>>>>
>>>>>>>> and upload the core found in the working directory, together with the
>>>>>>>> binary in a public place. If the osd binary is compiled with -g but
>>>>>>>> without
>>>>>>>> changing the -O settings, you should have a larger binary file but no
>>>>>>>> negative impact on performances. Forensics analysis will be made a lot
>>>>>>>> easier with the debugging symbols.
>>>>>>>>
>>>>>>>> My 2cts
>>>>>>>>
>>>>>>>> On 01/31/2013 08:57 PM, Sage Weil wrote:
>>>>>>>>>
>>>>>>>>> On Thu, 31 Jan 2013, Sylvain Munaut wrote:
>>>>>>>>>>
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> I disabled scrubbing using
>>>>>>>>>>
>>>>>>>>>>> ceph osd tell \* injectargs '--osd-scrub-min-interval 1000000'
>>>>>>>>>>> ceph osd tell \* injectargs '--osd-scrub-max-interval 10000000'
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> and the leak seems to be gone.
>>>>>>>>>>
>>>>>>>>>> See the graph at  http://i.imgur.com/A0KmVot.png  with the OSD
>>>>>>>>>> memory
>>>>>>>>>> for the 12 osd processes over the last 3.5 days.
>>>>>>>>>> Memory was rising every 24h. I did the change yesterday around 13h00
>>>>>>>>>> and OSDs stopped growing. OSD memory even seems to go down slowly by
>>>>>>>>>> small blocks.
>>>>>>>>>>
>>>>>>>>>> Of course I assume disabling scrubbing is not a long term solution
>>>>>>>>>> and
>>>>>>>>>> I should re-enable it ... (how do I do that btw ? what were the
>>>>>>>>>> default values for those parameters)
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> It depends on the exact commit you're on.  You can see the defaults
>>>>>>>>> if
>>>>>>>>> you
>>>>>>>>> do
>>>>>>>>>
>>>>>>>>>   ceph-osd --show-config | grep osd_scrub
>>>>>>>>>
>>>>>>>>> Thanks for testing this... I have a few other ideas to try to
>>>>>>>>> reproduce.
>>>>>>>>>
>>>>>>>>> sage
>>>>>>>>> --
>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
>>>>>>>>> in
>>>>>>>>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Lo?c Dachary, Artisan Logiciel Libre
>>>>>>>>
>>>>>>>
>>>>>
>>>>>
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>
>>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux