OSDs are crashing with "Cannot fork" or "cannot create thread" but plenty of memory is left

christian.eichelmann@xxxxxxxx (Christian Eichelmann) · Tue, 23 Sep 2014 14:48:30 +0200

Hi Nathan,

that was indeed the Problem! I was increasing the max_pid value to 65535
and the problem is gone! Thank you!

It was a bit misleading that there is also a
/proc/sys/kernel/threads-max, which has a much higher number. And since
I was only seeing around 400 processes and wasn't aware that threads are
also consuming pids, it was hard to find the root cause of this issue.

After this problem is solved, I'm thinking if it is a good idea to run
aout 40.000 Threads (in an idle cluster) on one machine. The system has
a load around 6-7 without having traffic, maybe just because of the
intense context-switching.

Anyways, thats another topic. Thank you for your help!

Regards,
Christian

Am 23.09.2014 03:21, schrieb Nathan O'Sullivan:
> Hi Christian,
> 
> Your problem is probably that your kernel.pid_max (the maximum
> threads+processes across the entire system) needs to be increased - the
> default is 32768, which is too low for even a medium density
> deployment.  You can test this easily enough with
> 
> $ ps axms | wc -l
> 
> If you get a number around the 30,000 mark then you are going to be
> affected.
> 
> There's an issue here http://tracker.ceph.com/issues/6142 , although it
> doesn't seem to have gotten much traction in terms of informing users.
> 
> Regards
> Nathan
> 
> On 15/09/2014 7:13 PM, Christian Eichelmann wrote:
>> Hi all,
>>
>> I have no idea why running out of filehandles should produce a "out of
>> memory" error, but well. I've increased the ulimit as you told me, and
>> nothing changed. I've noticed that the osd init script sets the max open
>> file handles explicitly, so I was setting the corresponding option in my
>> ceph conf. Now the limits of an OSD process look like this:
>>
>> Limit                     Soft Limit           Hard Limit
>> Units
>> Max cpu time              unlimited            unlimited
>> seconds
>> Max file size             unlimited            unlimited
>> bytes
>> Max data size             unlimited            unlimited
>> bytes
>> Max stack size            8388608              unlimited
>> bytes
>> Max core file size        unlimited            unlimited
>> bytes
>> Max resident set          unlimited            unlimited
>> bytes
>> Max processes             2067478              2067478
>> processes
>> Max open files            65536                65536
>> files
>> Max locked memory         65536                65536
>> bytes
>> Max address space         unlimited            unlimited
>> bytes
>> Max file locks            unlimited            unlimited
>> locks
>> Max pending signals       2067478              2067478
>> signals
>> Max msgqueue size         819200               819200
>> bytes
>> Max nice priority         0                    0
>> Max realtime priority     0                    0
>> Max realtime timeout      unlimited            unlimited            us
>>
>> Anyways, the exact same behavior as before. I was also finding a mailing
>> on this list from someone who had the exact same problem:
>> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2014-May/040059.html
>>
>> Unfortunately, there was also no real solution for this problem.
>>
>> So again: this is *NOT* a ulimit issue. We were running emperor and
>> dumpling on the same hardware without any issues. They first started
>> after our upgrade to firefly.
>>
>> Regards,
>> Christian
>>
>>
>> Am 12.09.2014 18:26, schrieb Christian Balzer:
>>> On Fri, 12 Sep 2014 12:05:06 -0400 Brian Rak wrote:
>>>
>>>> That's not how ulimit works.  Check the `ulimit -a` output.
>>>>
>>> Indeed.
>>>
>>> And to forestall the next questions, see "man initscript", mine looks
>>> like
>>> this:
>>> ---
>>> ulimit -Hn 131072
>>> ulimit -Sn 65536
>>>
>>> # Execute the program.
>>> eval exec "$4"
>>> ---
>>>
>>> And also a /etc/security/limits.d/tuning.conf (debian) like this:
>>> ---
>>> root            soft    nofile          65536
>>> root            hard    nofile          131072
>>> *               soft    nofile          16384
>>> *               hard    nofile          65536
>>> ---
>>>
>>> Adjusted to your actual needs. There might be other limits you're
>>> hitting,
>>> but that is the most likely one
>>>
>>> Also 45 OSDs with 12 (24 with HT, bleah) CPU cores is pretty ballsy.
>>> I personally would rather do 4 RAID6 (10 disks, with OSD SSD journals)
>>> with that kind of case and enjoy the fact that my OSDs never fail. ^o^
>>>
>>> Christian (another one)
>>>
>>>
>>>> On 9/12/2014 10:15 AM, Christian Eichelmann wrote:
>>>>> Hi,
>>>>>
>>>>> I am running all commands as root, so there are no limits for the
>>>>> processes.
>>>>>
>>>>> Regards,
>>>>> Christian
>>>>> _______________________________________
>>>>> Von: Mariusz Gronczewski [mariusz.gronczewski at efigence.com]
>>>>> Gesendet: Freitag, 12. September 2014 15:33
>>>>> An: Christian Eichelmann
>>>>> Cc: ceph-users at lists.ceph.com
>>>>> Betreff: Re: OSDs are crashing with "Cannot fork" or
>>>>> "cannot create thread" but plenty of memory is left
>>>>>
>>>>> do cat /proc/<pid>/limits
>>>>>
>>>>> probably you hit max processes limit or max FD limit
>>>>>
>>>>>> Hi Ceph-Users,
>>>>>>
>>>>>> I have absolutely no idea what is going on on my systems...
>>>>>>
>>>>>> Hardware:
>>>>>> 45 x 4TB Harddisks
>>>>>> 2 x 6 Core CPUs
>>>>>> 256GB Memory
>>>>>>
>>>>>> When initializing all disks and join them to the cluster, after
>>>>>> approximately 30 OSDs, other osds are crashing. When I try to start
>>>>>> them again I see different kinds of errors. For example:
>>>>>>
>>>>>>
>>>>>> Starting Ceph osd.316 on ceph-osd-bs04...already running
>>>>>> === osd.317 ===
>>>>>> Traceback (most recent call last):
>>>>>>     File "/usr/bin/ceph", line 830, in <module>
>>>>>>       sys.exit(main())
>>>>>>     File "/usr/bin/ceph", line 773, in main
>>>>>>       sigdict, inbuf, verbose)
>>>>>>     File "/usr/bin/ceph", line 420, in new_style_command
>>>>>>       inbuf=inbuf)
>>>>>>     File "/usr/lib/python2.7/dist-packages/ceph_argparse.py", line
>>>>>> 1112, in json_command
>>>>>>       raise RuntimeError('"{0}": exception {1}'.format(cmd, e))
>>>>>> NameError: global name 'cmd' is not defined
>>>>>> Exception thread.error: error("can't start new thread",) in <bound
>>>>>> method Rados.__del__ of <rados.Rados object
>>>>>> at 0x29ee410>> ignored
>>>>>>
>>>>>>
>>>>>> or:
>>>>>> /etc/init.d/ceph: 190: /etc/init.d/ceph: Cannot fork
>>>>>> /etc/init.d/ceph: 191: /etc/init.d/ceph: Cannot fork
>>>>>> /etc/init.d/ceph: 192: /etc/init.d/ceph: Cannot fork
>>>>>>
>>>>>> or:
>>>>>> /usr/bin/ceph-crush-location: 72: /usr/bin/ceph-crush-location:
>>>>>> Cannot fork /usr/bin/ceph-crush-location:
>>>>>> 79: /usr/bin/ceph-crush-location: Cannot fork Thread::try_create():
>>>>>> pthread_create failed with error 11common/Thread.cc: In function
>>>>>> 'void Thread::create(size_t)' thread 7fcf768c9760 time 2014-09-12
>>>>>> 15:00:28.284735 common/Thread.cc: 110: FAILED assert(ret == 0)
>>>>>>    ceph version 0.80.5 (38b73c67d375a2552d8ed67843c8a65c2c0feba6)
>>>>>>    1: /usr/bin/ceph-conf() [0x51de8f]
>>>>>>    2: (CephContext::CephContext(unsigned int)+0xb1) [0x520fe1]
>>>>>>    3: (common_preinit(CephInitParameters const&, code_environment_t,
>>>>>> int)+0x48) [0x52eb78]
>>>>>>    4: (global_pre_init(std::vector<char const*, std::allocator<char
>>>>>> const*> >*, std::vector<char const*, std::allocator<char const*> >&,
>>>>>> unsigned int, code_environment_t, int)+0x8d) [0x518d0d]
>>>>>>    5: (main()+0x17a) [0x514f6a]
>>>>>>    6: (__libc_start_main()+0xfd) [0x7fcf7522ceed]
>>>>>>    7: /usr/bin/ceph-conf() [0x5168d1]
>>>>>>    NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>>>>> needed to interpret this.
>>>>>> terminate called after throwing an instance of
>>>>>> 'ceph::FailedAssertion'
>>>>>> Aborted (core dumped)
>>>>>> /etc/init.d/ceph: 340: /etc/init.d/ceph: Cannot fork
>>>>>> /etc/init.d/ceph: 1: /etc/init.d/ceph: Cannot fork
>>>>>> Traceback (most recent call last):
>>>>>>     File "/usr/bin/ceph", line 830, in <module>
>>>>>>       sys.exit(main())
>>>>>>     File "/usr/bin/ceph", line 590, in main
>>>>>>       conffile=conffile)
>>>>>>     File "/usr/lib/python2.7/dist-packages/rados.py", line 198, in
>>>>>> __init__ librados_path = find_library('rados')
>>>>>>     File "/usr/lib/python2.7/ctypes/util.py", line 224, in
>>>>>> find_library
>>>>>>       return _findSoname_ldconfig(name) or
>>>>>> _get_soname(_findLib_gcc(name)) File
>>>>>> "/usr/lib/python2.7/ctypes/util.py", line 213, in
>>>>>> _findSoname_ldconfig
>>>>>>       f = os.popen('/sbin/ldconfig -p 2>/dev/null')
>>>>>> OSError: [Errno 12] Cannot allocate memory
>>>>>>
>>>>>> But anyways, when I look at the memory consumption of the system:
>>>>>> # free -m
>>>>>>                total       used       free     shared    buffers
>>>>>> cached Mem:        258450      25841     232609          0
>>>>>> 18      15506 -/+ buffers/cache:      10315     248135
>>>>>> Swap:         3811          0       3811
>>>>>>
>>>>>>
>>>>>> There are more then 230GB of memory available! What is going on
>>>>>> there?
>>>>>> System:
>>>>>> Linux ceph-osd-bs04 3.14-0.bpo.1-amd64 #1 SMP Debian
>>>>>> 3.14.12-1~bpo70+1
>>>>>> (2014-07-13) x86_64 GNU/Linux
>>>>>>
>>>>>> Since this is happening on other Hardware as well, I don't think it's
>>>>>> Hardware related. I have no Idea if this is an OS issue (which would
>>>>>> be seriously strange) or a ceph issue.
>>>>>>
>>>>>> Since this is happening only AFTER we upgraded to firefly, I guess it
>>>>>> has something to do with ceph.
>>>>>>
>>>>>> ANY idea on what is going on here would be very appreciated!
>>>>>>
>>>>>> Regards,
>>>>>> Christian
>>>>>> _______________________________________________
>>>>>> ceph-users mailing list
>>>>>> ceph-users at lists.ceph.com
>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>>
>>>>> -- 
>>>>> Mariusz Gronczewski, Administrator
>>>>>
>>>>> Efigence S. A.
>>>>> ul. Wo?oska 9a, 02-583 Warszawa
>>>>> T: [+48] 22 380 13 13
>>>>> F: [+48] 22 380 13 14
>>>>> E: mariusz.gronczewski at efigence.com
>>>>> <mailto:mariusz.gronczewski at efigence.com>
>>>>> _______________________________________________
>>>>> ceph-users mailing list
>>>>> ceph-users at lists.ceph.com
>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>> _______________________________________________
>>>> ceph-users mailing list
>>>> ceph-users at lists.ceph.com
>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>
>>>
>>
> 

-- 
Christian Eichelmann
Systemadministrator

1&1 Internet AG - IT Operations Mail & Media Advertising & Targeting
Brauerstra?e 48 ? DE-76135 Karlsruhe
Telefon: +49 721 91374-8026
christian.eichelmann at 1und1.de

Amtsgericht Montabaur / HRB 6484
Vorst?nde: Henning Ahlert, Ralph Dommermuth, Matthias Ehrlich, Robert
Hoffmann, Markus Huhn, Hans-Henning Kettler, Dr. Oliver Mauss, Jan Oetjen
Aufsichtsratsvorsitzender: Michael Scheeren