Expanding pg's of an erasure coded pool

lists@xxxxxxxxx (Henrik Korkuc) · Thu, 22 May 2014 20:47:44 +0300

On 2014.05.22 19:55, Gregory Farnum wrote:
> On Thu, May 22, 2014 at 4:09 AM, Kenneth Waegeman
> <Kenneth.Waegeman at ugent.be> wrote:
>> ----- Message from Gregory Farnum <greg at inktank.com> ---------
>>    Date: Wed, 21 May 2014 15:46:17 -0700
>>
>>    From: Gregory Farnum <greg at inktank.com>
>> Subject: Re: Expanding pg's of an erasure coded pool
>>      To: Kenneth Waegeman <Kenneth.Waegeman at ugent.be>
>>      Cc: ceph-users <ceph-users at lists.ceph.com>
>>
>>
>>> On Wed, May 21, 2014 at 3:52 AM, Kenneth Waegeman
>>> <Kenneth.Waegeman at ugent.be> wrote:
>>>> Thanks! I increased the max processes parameter for all daemons quite a
>>>> lot
>>>> (until ulimit -u 3802720)
>>>>
>>>> These are the limits for the daemons now..
>>>> [root@ ~]# cat /proc/17006/limits
>>>> Limit                     Soft Limit           Hard Limit           Units
>>>> Max cpu time              unlimited            unlimited
>>>> seconds
>>>> Max file size             unlimited            unlimited            bytes
>>>> Max data size             unlimited            unlimited            bytes
>>>> Max stack size            10485760             unlimited            bytes
>>>> Max core file size        unlimited            unlimited            bytes
>>>> Max resident set          unlimited            unlimited            bytes
>>>> Max processes             3802720              3802720
>>>> processes
>>>> Max open files            32768                32768                files
>>>> Max locked memory         65536                65536                bytes
>>>> Max address space         unlimited            unlimited            bytes
>>>> Max file locks            unlimited            unlimited            locks
>>>> Max pending signals       95068                95068
>>>> signals
>>>> Max msgqueue size         819200               819200               bytes
>>>> Max nice priority         0                    0
>>>> Max realtime priority     0                    0
>>>> Max realtime timeout      unlimited            unlimited            us
>>>>
>>>> But this didn't help. Are there other parameters I should change?
>>>
>>> Hrm, is it exactly the same stack trace? You might need to bump the
>>> open files limit as well, although I'd be surprised. :/
>>
>> I increased the open file limit as test to 128000, still the same results.
>>
>> Stack trace:
> <snip>
>
>> But I see some things happening on the system while doing this too:
>>
>>
>>
>> [root@ ~]# ceph osd pool set ecdata15 pgp_num 4096
>> set pool 16 pgp_num to 4096
>> [root@ ~]# ceph status
>> Traceback (most recent call last):
>>   File "/usr/bin/ceph", line 830, in <module>
>>     sys.exit(main())
>>   File "/usr/bin/ceph", line 590, in main
>>     conffile=conffile)
>>   File "/usr/lib/python2.6/site-packages/rados.py", line 198, in __init__
>>     librados_path = find_library('rados')
>>   File "/usr/lib64/python2.6/ctypes/util.py", line 209, in find_library
>>     return _findSoname_ldconfig(name) or _get_soname(_findLib_gcc(name))
>>   File "/usr/lib64/python2.6/ctypes/util.py", line 203, in
>> _findSoname_ldconfig
>>     os.popen('LANG=C /sbin/ldconfig -p 2>/dev/null').read())
>> OSError: [Errno 12] Cannot allocate memory
>> [root@ ~]# lsof | wc
>> -bash: fork: Cannot allocate memory
>> [root@ ~]# lsof | wc
>>   21801  211209 3230028
>> [root@ ~]# ceph status
>> ^CError connecting to cluster: InterruptedOrTimeoutError
>> ^[[A[root@ ~]# lsof | wc
>>    2028   17476  190947
>>
>>
>>
>> And meanwhile the daemons has then been crashed.
>>
>> I verified the memory never ran out.
> Is there anything in dmesg? It sure looks like the OS thinks it's run
> out of memory one way or another.
> -Greg
> Software Engineer #42 @ http://inktank.com | http://ceph.com
> _______________________________________________
> ceph-users mailing list
> ceph-users at lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

May it be related to memory fragmentation?
http://dom.as/2014/01/17/on-swapping-and-kernels/