That's not how ulimit works. Check the `ulimit -a` output. On 9/12/2014 10:15 AM, Christian Eichelmann wrote: > Hi, > > I am running all commands as root, so there are no limits for the processes. > > Regards, > Christian > _______________________________________ > Von: Mariusz Gronczewski [mariusz.gronczewski at efigence.com] > Gesendet: Freitag, 12. September 2014 15:33 > An: Christian Eichelmann > Cc: ceph-users at lists.ceph.com > Betreff: Re: OSDs are crashing with "Cannot fork" or "cannot create thread" but plenty of memory is left > > do cat /proc/<pid>/limits > > probably you hit max processes limit or max FD limit > >> Hi Ceph-Users, >> >> I have absolutely no idea what is going on on my systems... >> >> Hardware: >> 45 x 4TB Harddisks >> 2 x 6 Core CPUs >> 256GB Memory >> >> When initializing all disks and join them to the cluster, after >> approximately 30 OSDs, other osds are crashing. When I try to start them >> again I see different kinds of errors. For example: >> >> >> Starting Ceph osd.316 on ceph-osd-bs04...already running >> === osd.317 === >> Traceback (most recent call last): >> File "/usr/bin/ceph", line 830, in <module> >> sys.exit(main()) >> File "/usr/bin/ceph", line 773, in main >> sigdict, inbuf, verbose) >> File "/usr/bin/ceph", line 420, in new_style_command >> inbuf=inbuf) >> File "/usr/lib/python2.7/dist-packages/ceph_argparse.py", line 1112, >> in json_command >> raise RuntimeError('"{0}": exception {1}'.format(cmd, e)) >> NameError: global name 'cmd' is not defined >> Exception thread.error: error("can't start new thread",) in <bound >> method Rados.__del__ of <rados.Rados object >> at 0x29ee410>> ignored >> >> >> or: >> /etc/init.d/ceph: 190: /etc/init.d/ceph: Cannot fork >> /etc/init.d/ceph: 191: /etc/init.d/ceph: Cannot fork >> /etc/init.d/ceph: 192: /etc/init.d/ceph: Cannot fork >> >> or: >> /usr/bin/ceph-crush-location: 72: /usr/bin/ceph-crush-location: Cannot fork >> /usr/bin/ceph-crush-location: 79: /usr/bin/ceph-crush-location: Cannot fork >> Thread::try_create(): pthread_create failed with error >> 11common/Thread.cc: In function 'void Thread::create(size_t)' thread >> 7fcf768c9760 time 2014-09-12 15:00:28.284735 >> common/Thread.cc: 110: FAILED assert(ret == 0) >> ceph version 0.80.5 (38b73c67d375a2552d8ed67843c8a65c2c0feba6) >> 1: /usr/bin/ceph-conf() [0x51de8f] >> 2: (CephContext::CephContext(unsigned int)+0xb1) [0x520fe1] >> 3: (common_preinit(CephInitParameters const&, code_environment_t, >> int)+0x48) [0x52eb78] >> 4: (global_pre_init(std::vector<char const*, std::allocator<char >> const*> >*, std::vector<char const*, std::allocator<char const*> >&, >> unsigned int, code_environment_t, int)+0x8d) [0x518d0d] >> 5: (main()+0x17a) [0x514f6a] >> 6: (__libc_start_main()+0xfd) [0x7fcf7522ceed] >> 7: /usr/bin/ceph-conf() [0x5168d1] >> NOTE: a copy of the executable, or `objdump -rdS <executable>` is >> needed to interpret this. >> terminate called after throwing an instance of 'ceph::FailedAssertion' >> Aborted (core dumped) >> /etc/init.d/ceph: 340: /etc/init.d/ceph: Cannot fork >> /etc/init.d/ceph: 1: /etc/init.d/ceph: Cannot fork >> Traceback (most recent call last): >> File "/usr/bin/ceph", line 830, in <module> >> sys.exit(main()) >> File "/usr/bin/ceph", line 590, in main >> conffile=conffile) >> File "/usr/lib/python2.7/dist-packages/rados.py", line 198, in __init__ >> librados_path = find_library('rados') >> File "/usr/lib/python2.7/ctypes/util.py", line 224, in find_library >> return _findSoname_ldconfig(name) or _get_soname(_findLib_gcc(name)) >> File "/usr/lib/python2.7/ctypes/util.py", line 213, in >> _findSoname_ldconfig >> f = os.popen('/sbin/ldconfig -p 2>/dev/null') >> OSError: [Errno 12] Cannot allocate memory >> >> But anyways, when I look at the memory consumption of the system: >> # free -m >> total used free shared buffers cached >> Mem: 258450 25841 232609 0 18 15506 >> -/+ buffers/cache: 10315 248135 >> Swap: 3811 0 3811 >> >> >> There are more then 230GB of memory available! What is going on there? >> System: >> Linux ceph-osd-bs04 3.14-0.bpo.1-amd64 #1 SMP Debian 3.14.12-1~bpo70+1 >> (2014-07-13) x86_64 GNU/Linux >> >> Since this is happening on other Hardware as well, I don't think it's >> Hardware related. I have no Idea if this is an OS issue (which would be >> seriously strange) or a ceph issue. >> >> Since this is happening only AFTER we upgraded to firefly, I guess it >> has something to do with ceph. >> >> ANY idea on what is going on here would be very appreciated! >> >> Regards, >> Christian >> _______________________________________________ >> ceph-users mailing list >> ceph-users at lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > -- > Mariusz Gronczewski, Administrator > > Efigence S. A. > ul. Wo?oska 9a, 02-583 Warszawa > T: [+48] 22 380 13 13 > F: [+48] 22 380 13 14 > E: mariusz.gronczewski at efigence.com > <mailto:mariusz.gronczewski at efigence.com> > _______________________________________________ > ceph-users mailing list > ceph-users at lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com