Re: Segmentation faults in ceph-osd

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Greg,

Here are some more stats on our servers:
- each server has 64GB ram,
- there are 12 OSDs pr. server,
- each OSD uses around 1.5 GB of memory,
- we have 18432 PGs,
- around 5 to 10 MB writes/s is written to each OSD and almost no reads (yet).

/Emil

On 21 May 2013 17:10, Gregory Farnum <greg@xxxxxxxxxxx> wrote:
> That looks like an attempt at a 370MB memory allocation. :? What's the
> memory use like on those nodes, and what's your workload?
> -Greg
>
>
> On Tuesday, May 21, 2013, Emil Renner Berthing wrote:
>>
>> Hi,
>>
>> We're experiencing random segmentation faults in the osd daemon from
>> the 0.61.2-1~bpo70+1 debian packages. It happens across all our
>> servers and we've seen around 40 crashes in the last week.
>>
>> It seems to happen more often on loaded servers, but at least they all
>> return the same error in the logs. An example can be found here:
>> http://esmil.dk/osdcrash.txt
>>
>> Here is the backtrace from the core dump:
>>
>> #0  0x00007f87b148eefb in raise () from
>> /lib/x86_64-linux-gnu/libpthread.so.0
>> #1  0x0000000000853a89 in reraise_fatal (signum=11) at
>> global/signal_handler.cc:58
>> #2  handle_fatal_signal (signum=11) at global/signal_handler.cc:104
>> #3  <signal handler called>
>> #4  0x00007f87b06a96f3 in do_malloc (size=388987616) at
>> src/tcmalloc.cc:1059
>> #5  cpp_alloc (nothrow=false, size=388987616) at src/tcmalloc.cc:1354
>> #6  tc_new (size=388987616) at src/tcmalloc.cc:1530
>> #7  0x00007f87a60c89b0 in ?? ()
>> #8  0x00000000172f7ae0 in ?? ()
>> #9  0x00007f87b0459b21 in ?? () from
>> /usr/lib/x86_64-linux-gnu/libleveldb.so.1
>> #10 0x00007f87b0456ba8 in ?? () from
>> /usr/lib/x86_64-linux-gnu/libleveldb.so.1
>> #11 0x00007f87b04424d4 in ?? () from
>> /usr/lib/x86_64-linux-gnu/libleveldb.so.1
>> #12 0x0000000000840977 in
>> LevelDBStore::LevelDBWholeSpaceIteratorImpl::lower_bound
>> (this=0x20910a20, prefix=..., to=...) at os/LevelDBStore.h:204
>> #13 0x000000000083f351 in LevelDBStore::get (this=<optimized out>,
>> prefix=..., keys=..., out=0x7f87a60c8d00) at os/LevelDBStore.cc:106
>> #14 0x0000000000838449 in DBObjectMap::_lookup_map_header
>> (this=this@entry=0x316d4a0, hoid=...) at os/DBObjectMap.cc:1080
>> #15 0x000000000083e4a9 in DBObjectMap::lookup_map_header
>> (this=this@entry=0x316d4a0, hoid=...) at os/DBObjectMap.h:404
>> #16 0x0000000000839e06 in DBObjectMap::rm_keys (this=0x316d4a0,
>> hoid=..., to_clear=..., spos=0x7f87a60c9400) at os/DBObjectMap.cc:696
>> #17 0x00000000007f40c1 in FileStore::_omap_rmkeys
>> (this=this@entry=0x3188000, cid=..., hoid=..., keys=..., spos=...) at
>> os/FileStore.cc:4765
>> #18 0x000000000080f610 in FileStore::_do_transaction
>> (this=this@entry=0x3188000, t=..., op_seq=op_seq@entry=4760123,
>> trans_num=trans_num@entry=0) at os/FileStore.cc:2595
>> #19 0x0000000000812999 in FileStore::_do_transactions
>> (this=this@entry=0x3188000, tls=..., op_seq=4760123,
>> handle=handle@entry=0x7f87a60c9b80) at os/FileStore.cc:2151
>> #20 0x0000000000812b2e in FileStore::_do_op (this=0x3188000,
>> osr=<optimized out>, handle=...) at os/FileStore.cc:1985
>> #21 0x00000000008f52ea in ThreadPool::worker (this=0x3188a08,
>> wt=0x319c3e0) at common/WorkQueue.cc:119
>> #22 0x00000000008f6590 in ThreadPool::WorkThread::entry
>> (this=<optimized out>) at common/WorkQueue.h:316
>> #23 0x00007f87b1486b50 in start_thread () from
>> /lib/x86_64-linux-gnu/libpthread.so.0
>> #24 0x00007f87af9c2a7d in clone () from /lib/x86_64-linux-gnu/libc.so.6
>> #25 0x0000000000000000 in ?? ()
>>
>> Please let me know if can provide any other info to help find this bug.
>> /Emil
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>
>
> --
> Software Engineer #42 @ http://inktank.com | http://ceph.com
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux