Re: [0.48.3] OSD memory leak when scrubbing

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Set your /proc/sys/kernel/core_pattern file. :) http://linux.die.net/man/5/core
-Greg

On Mon, Feb 4, 2013 at 1:08 PM, Sébastien Han <han.sebastien@xxxxxxxxx> wrote:
> ok I finally managed to get something on my test cluster,
> unfortunately, the dump goes to /
>
> any idea to change the destination path?
>
> My production / won't be big enough...
>
> --
> Regards,
> Sébastien Han.
>
>
> On Mon, Feb 4, 2013 at 10:03 PM, Dan Mick <dan.mick@xxxxxxxxxxx> wrote:
>> ...and/or do you have the corepath set interestingly, or one of the
>> core-trapping mechanisms turned on?
>>
>>
>> On 02/04/2013 11:29 AM, Sage Weil wrote:
>>>
>>> On Mon, 4 Feb 2013, S?bastien Han wrote:
>>>>
>>>> Hum just tried several times on my test cluster and I can't get any
>>>> core dump. Does Ceph commit suicide or something? Is it expected
>>>> behavior?
>>>
>>>
>>> SIGSEGV should trigger the usual path that dumps a stack trace and then
>>> dumps core.  Was your ulimit -c set before the daemon was started?
>>>
>>> sage
>>>
>>>
>>>
>>>> --
>>>> Regards,
>>>> S?bastien Han.
>>>>
>>>>
>>>> On Sun, Feb 3, 2013 at 10:03 PM, S?bastien Han <han.sebastien@xxxxxxxxx>
>>>> wrote:
>>>>>
>>>>> Hi Lo?c,
>>>>>
>>>>> Thanks for bringing our discussion on the ML. I'll check that tomorrow
>>>>> :-).
>>>>>
>>>>> Cheer
>>>>> --
>>>>> Regards,
>>>>> S?bastien Han.
>>>>>
>>>>>
>>>>> On Sun, Feb 3, 2013 at 10:01 PM, S?bastien Han <han.sebastien@xxxxxxxxx>
>>>>> wrote:
>>>>>>
>>>>>> Hi Lo?c,
>>>>>>
>>>>>> Thanks for bringing our discussion on the ML. I'll check that tomorrow
>>>>>> :-).
>>>>>>
>>>>>> Cheers
>>>>>>
>>>>>> --
>>>>>> Regards,
>>>>>> S?bastien Han.
>>>>>>
>>>>>>
>>>>>> On Sun, Feb 3, 2013 at 7:17 PM, Loic Dachary <loic@xxxxxxxxxxx> wrote:
>>>>>>>
>>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> As discussed during FOSDEM, the script you wrote to kill the OSD when
>>>>>>> it
>>>>>>> grows too much could be amended to core dump instead of just being
>>>>>>> killed &
>>>>>>> restarted. The binary + core could probably be used to figure out
>>>>>>> where the
>>>>>>> leak is.
>>>>>>>
>>>>>>> You should make sure the OSD current working directory is in a file
>>>>>>> system
>>>>>>> with enough free disk space to accomodate for the dump and set
>>>>>>>
>>>>>>> ulimit -c unlimited
>>>>>>>
>>>>>>> before running it ( your system default is probably ulimit -c 0 which
>>>>>>> inhibits core dumps ). When you detect that OSD grows too much kill it
>>>>>>> with
>>>>>>>
>>>>>>> kill -SEGV $pid
>>>>>>>
>>>>>>> and upload the core found in the working directory, together with the
>>>>>>> binary in a public place. If the osd binary is compiled with -g but
>>>>>>> without
>>>>>>> changing the -O settings, you should have a larger binary file but no
>>>>>>> negative impact on performances. Forensics analysis will be made a lot
>>>>>>> easier with the debugging symbols.
>>>>>>>
>>>>>>> My 2cts
>>>>>>>
>>>>>>> On 01/31/2013 08:57 PM, Sage Weil wrote:
>>>>>>>>
>>>>>>>> On Thu, 31 Jan 2013, Sylvain Munaut wrote:
>>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> I disabled scrubbing using
>>>>>>>>>
>>>>>>>>>> ceph osd tell \* injectargs '--osd-scrub-min-interval 1000000'
>>>>>>>>>> ceph osd tell \* injectargs '--osd-scrub-max-interval 10000000'
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> and the leak seems to be gone.
>>>>>>>>>
>>>>>>>>> See the graph at  http://i.imgur.com/A0KmVot.png  with the OSD
>>>>>>>>> memory
>>>>>>>>> for the 12 osd processes over the last 3.5 days.
>>>>>>>>> Memory was rising every 24h. I did the change yesterday around 13h00
>>>>>>>>> and OSDs stopped growing. OSD memory even seems to go down slowly by
>>>>>>>>> small blocks.
>>>>>>>>>
>>>>>>>>> Of course I assume disabling scrubbing is not a long term solution
>>>>>>>>> and
>>>>>>>>> I should re-enable it ... (how do I do that btw ? what were the
>>>>>>>>> default values for those parameters)
>>>>>>>>
>>>>>>>>
>>>>>>>> It depends on the exact commit you're on.  You can see the defaults
>>>>>>>> if
>>>>>>>> you
>>>>>>>> do
>>>>>>>>
>>>>>>>>   ceph-osd --show-config | grep osd_scrub
>>>>>>>>
>>>>>>>> Thanks for testing this... I have a few other ideas to try to
>>>>>>>> reproduce.
>>>>>>>>
>>>>>>>> sage
>>>>>>>> --
>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
>>>>>>>> in
>>>>>>>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Lo?c Dachary, Artisan Logiciel Libre
>>>>>>>
>>>>>>
>>>>
>>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux