Re: [0.48.3] OSD memory leak when scrubbing

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



ok I finally managed to get something on my test cluster,
unfortunately, the dump goes to /

any idea to change the destination path?

My production / won't be big enough...

--
Regards,
Sébastien Han.


On Mon, Feb 4, 2013 at 10:03 PM, Dan Mick <dan.mick@xxxxxxxxxxx> wrote:
> ...and/or do you have the corepath set interestingly, or one of the
> core-trapping mechanisms turned on?
>
>
> On 02/04/2013 11:29 AM, Sage Weil wrote:
>>
>> On Mon, 4 Feb 2013, S?bastien Han wrote:
>>>
>>> Hum just tried several times on my test cluster and I can't get any
>>> core dump. Does Ceph commit suicide or something? Is it expected
>>> behavior?
>>
>>
>> SIGSEGV should trigger the usual path that dumps a stack trace and then
>> dumps core.  Was your ulimit -c set before the daemon was started?
>>
>> sage
>>
>>
>>
>>> --
>>> Regards,
>>> S?bastien Han.
>>>
>>>
>>> On Sun, Feb 3, 2013 at 10:03 PM, S?bastien Han <han.sebastien@xxxxxxxxx>
>>> wrote:
>>>>
>>>> Hi Lo?c,
>>>>
>>>> Thanks for bringing our discussion on the ML. I'll check that tomorrow
>>>> :-).
>>>>
>>>> Cheer
>>>> --
>>>> Regards,
>>>> S?bastien Han.
>>>>
>>>>
>>>> On Sun, Feb 3, 2013 at 10:01 PM, S?bastien Han <han.sebastien@xxxxxxxxx>
>>>> wrote:
>>>>>
>>>>> Hi Lo?c,
>>>>>
>>>>> Thanks for bringing our discussion on the ML. I'll check that tomorrow
>>>>> :-).
>>>>>
>>>>> Cheers
>>>>>
>>>>> --
>>>>> Regards,
>>>>> S?bastien Han.
>>>>>
>>>>>
>>>>> On Sun, Feb 3, 2013 at 7:17 PM, Loic Dachary <loic@xxxxxxxxxxx> wrote:
>>>>>>
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> As discussed during FOSDEM, the script you wrote to kill the OSD when
>>>>>> it
>>>>>> grows too much could be amended to core dump instead of just being
>>>>>> killed &
>>>>>> restarted. The binary + core could probably be used to figure out
>>>>>> where the
>>>>>> leak is.
>>>>>>
>>>>>> You should make sure the OSD current working directory is in a file
>>>>>> system
>>>>>> with enough free disk space to accomodate for the dump and set
>>>>>>
>>>>>> ulimit -c unlimited
>>>>>>
>>>>>> before running it ( your system default is probably ulimit -c 0 which
>>>>>> inhibits core dumps ). When you detect that OSD grows too much kill it
>>>>>> with
>>>>>>
>>>>>> kill -SEGV $pid
>>>>>>
>>>>>> and upload the core found in the working directory, together with the
>>>>>> binary in a public place. If the osd binary is compiled with -g but
>>>>>> without
>>>>>> changing the -O settings, you should have a larger binary file but no
>>>>>> negative impact on performances. Forensics analysis will be made a lot
>>>>>> easier with the debugging symbols.
>>>>>>
>>>>>> My 2cts
>>>>>>
>>>>>> On 01/31/2013 08:57 PM, Sage Weil wrote:
>>>>>>>
>>>>>>> On Thu, 31 Jan 2013, Sylvain Munaut wrote:
>>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I disabled scrubbing using
>>>>>>>>
>>>>>>>>> ceph osd tell \* injectargs '--osd-scrub-min-interval 1000000'
>>>>>>>>> ceph osd tell \* injectargs '--osd-scrub-max-interval 10000000'
>>>>>>>>
>>>>>>>>
>>>>>>>> and the leak seems to be gone.
>>>>>>>>
>>>>>>>> See the graph at  http://i.imgur.com/A0KmVot.png  with the OSD
>>>>>>>> memory
>>>>>>>> for the 12 osd processes over the last 3.5 days.
>>>>>>>> Memory was rising every 24h. I did the change yesterday around 13h00
>>>>>>>> and OSDs stopped growing. OSD memory even seems to go down slowly by
>>>>>>>> small blocks.
>>>>>>>>
>>>>>>>> Of course I assume disabling scrubbing is not a long term solution
>>>>>>>> and
>>>>>>>> I should re-enable it ... (how do I do that btw ? what were the
>>>>>>>> default values for those parameters)
>>>>>>>
>>>>>>>
>>>>>>> It depends on the exact commit you're on.  You can see the defaults
>>>>>>> if
>>>>>>> you
>>>>>>> do
>>>>>>>
>>>>>>>   ceph-osd --show-config | grep osd_scrub
>>>>>>>
>>>>>>> Thanks for testing this... I have a few other ideas to try to
>>>>>>> reproduce.
>>>>>>>
>>>>>>> sage
>>>>>>> --
>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
>>>>>>> in
>>>>>>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Lo?c Dachary, Artisan Logiciel Libre
>>>>>>
>>>>>
>>>
>>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux