This failure means the messenger subsystem is trying to create a thread and is getting an error code back ? probably due to a process or system thread limit that you can turn up with ulimit. This is happening because a replicated PG primary needs a connection to only its replicas (generally 1 or 2 connections), but with an erasure-coded PG the primary requires a connection to m+n-1 replicas (everybody who's in the erasure-coding set, including itself). Right now our messenger requires a thread for each connection, so kerblam. (And it actually requires a couple such connections because we have separate heartbeat, cluster data, and client data systems.) -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com On Tue, May 20, 2014 at 3:43 AM, Kenneth Waegeman <Kenneth.Waegeman at ugent.be> wrote: > Hi, > > On a setup of 400 OSDs (20 nodes, with 20 OSDs per node), I first tried to > create a erasure coded pool with 4096 pgs, but this crashed the cluster. > I then started with 1024 pgs, expanding to 2048 (pg_num and pgp_num), when I > then try to expand to 4096 (not even quite enough) the cluster crashes > again. ( Do we need less of pg's with erasure coding?) > > The crash starts with individual OSDs crashing, eventually bringing down the > mons (until there is no more quorum or too few osds) > > Out of the logs: > > > -16> 2014-05-20 10:31:55.545590 7fd42f34d700 5 -- op tracker -- , seq: > 14301, time: 2014-05-20 10:31:55.545590, event: started, request: > pg_query(0.974 epoch 3315) v3 > -15> 2014-05-20 10:31:55.545776 7fd42f34d700 1 -- > 130.246.178.141:6836/10446 --> 130.246.179.191:6826/21854 -- pg_notify(0.974 > epoch 3326) v5 -- ?+0 0xc8b4ec0 con 0x9 > 026b40 > -14> 2014-05-20 10:31:55.545807 7fd42f34d700 5 -- op tracker -- , seq: > 14301, time: 2014-05-20 10:31:55.545807, event: done, request: > pg_query(0.974 epoch 3315) v3 > -13> 2014-05-20 10:31:55.559661 7fd3fdb0f700 1 -- > 130.246.178.141:6837/10446 >> :/0 pipe(0xce0c380 sd=468 :6837 s=0 pgs=0 cs=0 > l=0 c=0x1255f0c0).accept sd=468 130.246.179.191:60618/0 > -12> 2014-05-20 10:31:55.564034 7fd3bf72f700 1 -- > 130.246.178.141:6838/10446 >> :/0 pipe(0xe3f2300 sd=596 :6838 s=0 pgs=0 cs=0 > l=0 c=0x129b5ee0).accept sd=596 130.246.179.191:43913/0 > -11> 2014-05-20 10:31:55.627776 7fd42df4b700 1 -- > 130.246.178.141:0/10446 <== osd.170 130.246.179.191:6827/21854 3 ==== > osd_ping(ping_reply e3316 stamp 2014-05-20 10:31:52.994368) v2 ==== 47+0+0 > (855262282 0 0) 0xb6863c0 con 0x1255b9c0 > -10> 2014-05-20 10:31:55.629425 7fd42df4b700 1 -- > 130.246.178.141:0/10446 <== osd.170 130.246.179.191:6827/21854 4 ==== > osd_ping(ping_reply e3316 stamp 2014-05-20 10:31:53.509621) v2 ==== 47+0+0 > (2581193378 0 0) 0x93d6c80 con 0x1255b9c0 > -9> 2014-05-20 10:31:55.631270 7fd42f34d700 1 -- > 130.246.178.141:6836/10446 <== osd.169 130.246.179.191:6841/25473 2 ==== > pg_query(7.3ffs6 epoch 3326) v3 ==== 144+0+0 (221596234 0 0) 0x10b994a0 con > 0x9383860 > -8> 2014-05-20 10:31:55.631308 7fd42f34d700 5 -- op tracker -- , seq: > 14302, time: 2014-05-20 10:31:55.631130, event: header_read, request: > pg_query(7.3ffs6 epoch 3326) v3 > -7> 2014-05-20 10:31:55.631315 7fd42f34d700 5 -- op tracker -- , seq: > 14302, time: 2014-05-20 10:31:55.631133, event: throttled, request: > pg_query(7.3ffs6 epoch 3326) v3 > -6> 2014-05-20 10:31:55.631339 7fd42f34d700 5 -- op tracker -- , seq: > 14302, time: 2014-05-20 10:31:55.631207, event: all_read, request: > pg_query(7.3ffs6 epoch 3326) v3 > -5> 2014-05-20 10:31:55.631343 7fd42f34d700 5 -- op tracker -- , seq: > 14302, time: 2014-05-20 10:31:55.631303, event: dispatched, request: > pg_query(7.3ffs6 epoch 3326) v3 > -4> 2014-05-20 10:31:55.631349 7fd42f34d700 5 -- op tracker -- , seq: > 14302, time: 2014-05-20 10:31:55.631349, event: waiting_for_osdmap, request: > pg_query(7.3ffs6 epoch 3326) v3 > -3> 2014-05-20 10:31:55.631363 7fd42f34d700 5 -- op tracker -- , seq: > 14302, time: 2014-05-20 10:31:55.631363, event: started, request: > pg_query(7.3ffs6 epoch 3326) v3 > -2> 2014-05-20 10:31:55.631402 7fd42f34d700 5 -- op tracker -- , seq: > 14302, time: 2014-05-20 10:31:55.631402, event: done, request: > pg_query(7.3ffs6 epoch 3326) v3 > -1> 2014-05-20 10:31:55.631488 7fd427b41700 1 -- > 130.246.178.141:6836/10446 --> 130.246.179.191:6841/25473 -- > pg_notify(7.3ffs6(14) epoch 3326) v5 -- ?+0 0xcc7b9c0 con 0x9383860 > 0> 2014-05-20 10:31:55.632127 7fd42cb49700 -1 common/Thread.cc: In > function 'void Thread::create(size_t)' thread 7fd42cb49700 time 2014-05-20 > 10:31:55.630937 > common/Thread.cc: 110: FAILED assert(ret == 0) > > ceph version 0.80.1 (a38fe1169b6d2ac98b427334c12d7cf81f809b74) > 1: (Thread::create(unsigned long)+0x8a) [0xa83f8a] > 2: (SimpleMessenger::add_accept_pipe(int)+0x6a) [0xa2a6aa] > 3: (Accepter::entry()+0x265) [0xb3ca45] > 4: (()+0x79d1) [0x7fd4436b19d1] > 5: (clone()+0x6d) [0x7fd4423ecb6d] > > --- begin dump of recent events --- > 0> 2014-05-20 10:31:56.622247 7fd3bc5fe700 -1 *** Caught signal > (Aborted) ** > in thread 7fd3bc5fe700 > > ceph version 0.80.1 (a38fe1169b6d2ac98b427334c12d7cf81f809b74) > 1: /usr/bin/ceph-osd() [0x9ab3b1] > 2: (()+0xf710) [0x7fd4436b9710] > 3: (gsignal()+0x35) [0x7fd442336925] > 4: (abort()+0x175) [0x7fd442338105] > 5: (__gnu_cxx::__verbose_terminate_handler()+0x12d) [0x7fd442bf0a5d] > 6: (()+0xbcbe6) [0x7fd442beebe6] > 7: (()+0xbcc13) [0x7fd442beec13] > 8: (()+0xbcd0e) [0x7fd442beed0e] > 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char > const*)+0x7f2) [0xaec612] > 10: (Thread::create(unsigned long)+0x8a) [0xa83f8a] > 11: (Pipe::connect()+0x2efb) [0xb2850b] > 12: (Pipe::writer()+0x9f3) [0xb2a063] > 13: (Pipe::Writer::entry()+0xd) [0xb359cd] > 14: (()+0x79d1) [0x7fd4436b19d1] > 15: (clone()+0x6d) [0x7fd4423ecb6d] > NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to > interpret this. > > > --- begin dump of recent events --- > 0> 2014-05-20 10:37:50.378377 7ff018059700 -1 *** Caught signal > (Aborted) ** > in thread 7ff018059700 > > in the mon: > ceph version 0.80.1 (a38fe1169b6d2ac98b427334c12d7cf81f809b74) > 1: /usr/bin/ceph-mon() [0x86b991] > 2: (()+0xf710) [0x7ff01ee5b710] > 3: (gsignal()+0x35) [0x7ff01dad8925] > 4: (abort()+0x175) [0x7ff01dada105] > 5: (__gnu_cxx::__verbose_terminate_handler()+0x12d) [0x7ff01e392a5d] > 6: (()+0xbcbe6) [0x7ff01e390be6] > 7: (()+0xbcc13) [0x7ff01e390c13] > 8: (()+0xbcd0e) [0x7ff01e390d0e] > 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char > const*)+0x7f2) [0x7a5472] > 10: (Thread::create(unsigned long)+0x8a) [0x748c9a] > 11: (SimpleMessenger::add_accept_pipe(int)+0x6a) [0x8351ba] > 12: (Accepter::entry()+0x265) [0x863295] > 13: (()+0x79d1) [0x7ff01ee539d1] > 14: (clone()+0x6d) [0x7ff01db8eb6d] > NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to > interpret this. > > When I make a replicated pool, I can go already to 8192pgs without problem. > > Thanks already!! > > Kind regards, > Kenneth > > _______________________________________________ > ceph-users mailing list > ceph-users at lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com