OSDs are crashing with "Cannot fork" or "cannot create thread" but plenty of memory is left

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



That's not how ulimit works.  Check the `ulimit -a` output.

On 9/12/2014 10:15 AM, Christian Eichelmann wrote:
> Hi,
>
> I am running all commands as root, so there are no limits for the processes.
>
> Regards,
> Christian
> _______________________________________
> Von: Mariusz Gronczewski [mariusz.gronczewski at efigence.com]
> Gesendet: Freitag, 12. September 2014 15:33
> An: Christian Eichelmann
> Cc: ceph-users at lists.ceph.com
> Betreff: Re: OSDs are crashing with "Cannot fork" or "cannot create thread" but plenty of memory is left
>
> do cat /proc/<pid>/limits
>
> probably you hit max processes limit or max FD limit
>
>> Hi Ceph-Users,
>>
>> I have absolutely no idea what is going on on my systems...
>>
>> Hardware:
>> 45 x 4TB Harddisks
>> 2 x 6 Core CPUs
>> 256GB Memory
>>
>> When initializing all disks and join them to the cluster, after
>> approximately 30 OSDs, other osds are crashing. When I try to start them
>> again I see different kinds of errors. For example:
>>
>>
>> Starting Ceph osd.316 on ceph-osd-bs04...already running
>> === osd.317 ===
>> Traceback (most recent call last):
>>    File "/usr/bin/ceph", line 830, in <module>
>>      sys.exit(main())
>>    File "/usr/bin/ceph", line 773, in main
>>      sigdict, inbuf, verbose)
>>    File "/usr/bin/ceph", line 420, in new_style_command
>>      inbuf=inbuf)
>>    File "/usr/lib/python2.7/dist-packages/ceph_argparse.py", line 1112,
>> in json_command
>>      raise RuntimeError('"{0}": exception {1}'.format(cmd, e))
>> NameError: global name 'cmd' is not defined
>> Exception thread.error: error("can't start new thread",) in <bound
>> method Rados.__del__ of <rados.Rados object
>> at 0x29ee410>> ignored
>>
>>
>> or:
>> /etc/init.d/ceph: 190: /etc/init.d/ceph: Cannot fork
>> /etc/init.d/ceph: 191: /etc/init.d/ceph: Cannot fork
>> /etc/init.d/ceph: 192: /etc/init.d/ceph: Cannot fork
>>
>> or:
>> /usr/bin/ceph-crush-location: 72: /usr/bin/ceph-crush-location: Cannot fork
>> /usr/bin/ceph-crush-location: 79: /usr/bin/ceph-crush-location: Cannot fork
>> Thread::try_create(): pthread_create failed with error
>> 11common/Thread.cc: In function 'void Thread::create(size_t)' thread
>> 7fcf768c9760 time 2014-09-12 15:00:28.284735
>> common/Thread.cc: 110: FAILED assert(ret == 0)
>>   ceph version 0.80.5 (38b73c67d375a2552d8ed67843c8a65c2c0feba6)
>>   1: /usr/bin/ceph-conf() [0x51de8f]
>>   2: (CephContext::CephContext(unsigned int)+0xb1) [0x520fe1]
>>   3: (common_preinit(CephInitParameters const&, code_environment_t,
>> int)+0x48) [0x52eb78]
>>   4: (global_pre_init(std::vector<char const*, std::allocator<char
>> const*> >*, std::vector<char const*, std::allocator<char const*> >&,
>> unsigned int, code_environment_t, int)+0x8d) [0x518d0d]
>>   5: (main()+0x17a) [0x514f6a]
>>   6: (__libc_start_main()+0xfd) [0x7fcf7522ceed]
>>   7: /usr/bin/ceph-conf() [0x5168d1]
>>   NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>> needed to interpret this.
>> terminate called after throwing an instance of 'ceph::FailedAssertion'
>> Aborted (core dumped)
>> /etc/init.d/ceph: 340: /etc/init.d/ceph: Cannot fork
>> /etc/init.d/ceph: 1: /etc/init.d/ceph: Cannot fork
>> Traceback (most recent call last):
>>    File "/usr/bin/ceph", line 830, in <module>
>>      sys.exit(main())
>>    File "/usr/bin/ceph", line 590, in main
>>      conffile=conffile)
>>    File "/usr/lib/python2.7/dist-packages/rados.py", line 198, in __init__
>>      librados_path = find_library('rados')
>>    File "/usr/lib/python2.7/ctypes/util.py", line 224, in find_library
>>      return _findSoname_ldconfig(name) or _get_soname(_findLib_gcc(name))
>>    File "/usr/lib/python2.7/ctypes/util.py", line 213, in
>> _findSoname_ldconfig
>>      f = os.popen('/sbin/ldconfig -p 2>/dev/null')
>> OSError: [Errno 12] Cannot allocate memory
>>
>> But anyways, when I look at the memory consumption of the system:
>> # free -m
>>               total       used       free     shared    buffers     cached
>> Mem:        258450      25841     232609          0         18      15506
>> -/+ buffers/cache:      10315     248135
>> Swap:         3811          0       3811
>>
>>
>> There are more then 230GB of memory available! What is going on there?
>> System:
>> Linux ceph-osd-bs04 3.14-0.bpo.1-amd64 #1 SMP Debian 3.14.12-1~bpo70+1
>> (2014-07-13) x86_64 GNU/Linux
>>
>> Since this is happening on other Hardware as well, I don't think it's
>> Hardware related. I have no Idea if this is an OS issue (which would be
>> seriously strange) or a ceph issue.
>>
>> Since this is happening only AFTER we upgraded to firefly, I guess it
>> has something to do with ceph.
>>
>> ANY idea on what is going on here would be very appreciated!
>>
>> Regards,
>> Christian
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users at lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
> --
> Mariusz Gronczewski, Administrator
>
> Efigence S. A.
> ul. Wo?oska 9a, 02-583 Warszawa
> T: [+48] 22 380 13 13
> F: [+48] 22 380 13 14
> E: mariusz.gronczewski at efigence.com
> <mailto:mariusz.gronczewski at efigence.com>
> _______________________________________________
> ceph-users mailing list
> ceph-users at lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux