Le 04/06/2012 19:40, Sam Just a écrit :
Can you send the osd logs? The merge_log crashes are probably fixable
if I can see the logs.
Well I'm sorry - As I send in private mail I was away from computer for
a long time.
I can't send those logs anymore, they are rotated now...
Anyway. Now that I'm back, I try to restart where I stopped, and tried
to restart the failed nodes.
Upgraded the kernel to 3.5.0-rc4 + some patches, seems btrfs is OK right
now.
Tried to restart osd with 0.47.3, then next branch, and today with 0.48.
4 of 8 nodes fails with the same message :
ceph version 0.48argonaut (commit:c2b20ca74249892c8e5e40c12aa14446a2bf2030)
1: /usr/bin/ceph-osd() [0x701929]
2: (()+0xf030) [0x7fe5b4777030]
3: (gsignal()+0x35) [0x7fe5b33fc4f5]
4: (abort()+0x180) [0x7fe5b33ff770]
5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7fe5b3c4f68d]
6: (()+0x63796) [0x7fe5b3c4d796]
7: (()+0x637c3) [0x7fe5b3c4d7c3]
8: (()+0x639ee) [0x7fe5b3c4d9ee]
9: (std::__throw_length_error(char const*)+0x5d) [0x7fe5b3c9f5ed]
10: (()+0xbfad2) [0x7fe5b3ca9ad2]
11: (char* std::string::_S_construct<char const*>(char const*, char
const*, std::allocator<char> const&, std::forward_iterator_tag)+0x35)
[0x7fe5b3cab4a5]
12: (std::basic_string<char, std::char_traits<char>,
std::allocator<char> >::basic_string(char const*, unsigned long,
std::allocator<char> const&)+0x1d) [0x7fe5b3cab5bd]
13:
(leveldb::InternalKeyComparator::FindShortestSeparator(std::string*,
leveldb::Slice const&) const+0x4d) [0x6e811d]
14: (leveldb::TableBuilder::Add(leveldb::Slice const&, leveldb::Slice
const&)+0x9f) [0x6f681f]
15:
(leveldb::DBImpl::DoCompactionWork(leveldb::DBImpl::CompactionState*)+0x4d3)
[0x6e3643]
16: (leveldb::DBImpl::BackgroundCompaction()+0x222) [0x6e45a2]
17: (leveldb::DBImpl::BackgroundCall()+0x68) [0x6e4e18]
18: /usr/bin/ceph-osd() [0x6fd401]
19: (()+0x6b50) [0x7fe5b476eb50]
20: (clone()+0x6d) [0x7fe5b34a278d]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is
needed to interpret this.
ceph-osd is from the debian package (64 bits)
I have a core dump, but I'm afraid it won't help much :
gdb /usr/bin/ceph-osd core
GNU gdb (GDB) 7.0.1-debian
....
Core was generated by `/usr/bin/ceph-osd -i 2 --pid-file
/var/run/ceph/osd.2.pid -c /etc/ceph/ceph.con'.
Program terminated with signal 6, Aborted.
---Type <return> to continue, or q <return> to quit---
#0 0x00007fe5b4776efb in raise () from
/lib/x86_64-linux-gnu/libpthread.so.0
This time I REALLY CAN (knock on wood) furnish logs & core.
Granted, this crash was very probably caused by corruption on btrfs, but
it could be great if there's a way to recover the crashed osd node.
Cheers,
--
Yann Dupont - Service IRTS, DSI Université de Nantes
Tel : 02.53.48.49.20 - Mail/Jabber : Yann.Dupont@xxxxxxxxxxxxxx
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html