Re: domino-style OSD crash

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Le 04/06/2012 19:40, Sam Just a écrit :
Can you send the osd logs?  The merge_log crashes are probably fixable
if I can see the logs.


Well I'm sorry - As I send in private mail I was away from computer for a long time.
I can't send those logs anymore, they are rotated now...

Anyway. Now that I'm back, I try to restart where I stopped, and tried to restart the failed nodes.

Upgraded the kernel to 3.5.0-rc4 + some patches, seems btrfs is OK right now.

Tried to restart osd with 0.47.3, then next branch, and today with 0.48.

4 of 8 nodes fails with the same message :

ceph version 0.48argonaut (commit:c2b20ca74249892c8e5e40c12aa14446a2bf2030)
 1: /usr/bin/ceph-osd() [0x701929]
 2: (()+0xf030) [0x7fe5b4777030]
 3: (gsignal()+0x35) [0x7fe5b33fc4f5]
 4: (abort()+0x180) [0x7fe5b33ff770]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7fe5b3c4f68d]
 6: (()+0x63796) [0x7fe5b3c4d796]
 7: (()+0x637c3) [0x7fe5b3c4d7c3]
 8: (()+0x639ee) [0x7fe5b3c4d9ee]
 9: (std::__throw_length_error(char const*)+0x5d) [0x7fe5b3c9f5ed]
 10: (()+0xbfad2) [0x7fe5b3ca9ad2]
11: (char* std::string::_S_construct<char const*>(char const*, char const*, std::allocator<char> const&, std::forward_iterator_tag)+0x35) [0x7fe5b3cab4a5] 12: (std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string(char const*, unsigned long, std::allocator<char> const&)+0x1d) [0x7fe5b3cab5bd] 13: (leveldb::InternalKeyComparator::FindShortestSeparator(std::string*, leveldb::Slice const&) const+0x4d) [0x6e811d] 14: (leveldb::TableBuilder::Add(leveldb::Slice const&, leveldb::Slice const&)+0x9f) [0x6f681f] 15: (leveldb::DBImpl::DoCompactionWork(leveldb::DBImpl::CompactionState*)+0x4d3) [0x6e3643]
 16: (leveldb::DBImpl::BackgroundCompaction()+0x222) [0x6e45a2]
 17: (leveldb::DBImpl::BackgroundCall()+0x68) [0x6e4e18]
 18: /usr/bin/ceph-osd() [0x6fd401]
 19: (()+0x6b50) [0x7fe5b476eb50]
 20: (clone()+0x6d) [0x7fe5b34a278d]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

ceph-osd is from the debian package (64 bits)
I have a core dump, but I'm afraid it won't help much :

gdb /usr/bin/ceph-osd core
GNU gdb (GDB) 7.0.1-debian

....

Core was generated by `/usr/bin/ceph-osd -i 2 --pid-file /var/run/ceph/osd.2.pid -c /etc/ceph/ceph.con'.
Program terminated with signal 6, Aborted.
---Type <return> to continue, or q <return> to quit---
#0 0x00007fe5b4776efb in raise () from /lib/x86_64-linux-gnu/libpthread.so.0

This time I REALLY CAN (knock on wood) furnish logs & core.

Granted, this crash was very probably caused by corruption on btrfs, but it could be great if there's a way to recover the crashed osd node.

Cheers,

--
Yann Dupont - Service IRTS, DSI Université de Nantes
Tel : 02.53.48.49.20 - Mail/Jabber : Yann.Dupont@xxxxxxxxxxxxxx

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux