Segmentation faults in ceph-osd

Emil Renner Berthing <ceph@xxxxxxxx> · Tue, 21 May 2013 13:21:56 +0200

Hi,

We're experiencing random segmentation faults in the osd daemon from
the 0.61.2-1~bpo70+1 debian packages. It happens across all our
servers and we've seen around 40 crashes in the last week.

It seems to happen more often on loaded servers, but at least they all
return the same error in the logs. An example can be found here:
http://esmil.dk/osdcrash.txt

Here is the backtrace from the core dump:

#0  0x00007f87b148eefb in raise () from /lib/x86_64-linux-gnu/libpthread.so.0
#1  0x0000000000853a89 in reraise_fatal (signum=11) at
global/signal_handler.cc:58
#2  handle_fatal_signal (signum=11) at global/signal_handler.cc:104
#3  <signal handler called>
#4  0x00007f87b06a96f3 in do_malloc (size=388987616) at src/tcmalloc.cc:1059
#5  cpp_alloc (nothrow=false, size=388987616) at src/tcmalloc.cc:1354
#6  tc_new (size=388987616) at src/tcmalloc.cc:1530
#7  0x00007f87a60c89b0 in ?? ()
#8  0x00000000172f7ae0 in ?? ()
#9  0x00007f87b0459b21 in ?? () from /usr/lib/x86_64-linux-gnu/libleveldb.so.1
#10 0x00007f87b0456ba8 in ?? () from /usr/lib/x86_64-linux-gnu/libleveldb.so.1
#11 0x00007f87b04424d4 in ?? () from /usr/lib/x86_64-linux-gnu/libleveldb.so.1
#12 0x0000000000840977 in
LevelDBStore::LevelDBWholeSpaceIteratorImpl::lower_bound
(this=0x20910a20, prefix=..., to=...) at os/LevelDBStore.h:204
#13 0x000000000083f351 in LevelDBStore::get (this=<optimized out>,
prefix=..., keys=..., out=0x7f87a60c8d00) at os/LevelDBStore.cc:106
#14 0x0000000000838449 in DBObjectMap::_lookup_map_header
(this=this@entry=0x316d4a0, hoid=...) at os/DBObjectMap.cc:1080
#15 0x000000000083e4a9 in DBObjectMap::lookup_map_header
(this=this@entry=0x316d4a0, hoid=...) at os/DBObjectMap.h:404
#16 0x0000000000839e06 in DBObjectMap::rm_keys (this=0x316d4a0,
hoid=..., to_clear=..., spos=0x7f87a60c9400) at os/DBObjectMap.cc:696
#17 0x00000000007f40c1 in FileStore::_omap_rmkeys
(this=this@entry=0x3188000, cid=..., hoid=..., keys=..., spos=...) at
os/FileStore.cc:4765
#18 0x000000000080f610 in FileStore::_do_transaction
(this=this@entry=0x3188000, t=..., op_seq=op_seq@entry=4760123,
trans_num=trans_num@entry=0) at os/FileStore.cc:2595
#19 0x0000000000812999 in FileStore::_do_transactions
(this=this@entry=0x3188000, tls=..., op_seq=4760123,
handle=handle@entry=0x7f87a60c9b80) at os/FileStore.cc:2151
#20 0x0000000000812b2e in FileStore::_do_op (this=0x3188000,
osr=<optimized out>, handle=...) at os/FileStore.cc:1985
#21 0x00000000008f52ea in ThreadPool::worker (this=0x3188a08,
wt=0x319c3e0) at common/WorkQueue.cc:119
#22 0x00000000008f6590 in ThreadPool::WorkThread::entry
(this=<optimized out>) at common/WorkQueue.h:316
#23 0x00007f87b1486b50 in start_thread () from
/lib/x86_64-linux-gnu/libpthread.so.0
#24 0x00007f87af9c2a7d in clone () from /lib/x86_64-linux-gnu/libc.so.6
#25 0x0000000000000000 in ?? ()

Please let me know if can provide any other info to help find this bug.
/Emil
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html