domino-style OSD crash

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello,
Besides the performance inconsistency (see other thread titled poor OSD performance using kernel 3.4) where I promised some tests (will run this afternoon), we tried this week-end to stress test ceph, making backups with bacula on a rbd volume of 15T (8 osd nodes, using 8 physical machines)

Results : Worked like a charm during two days, apart btrfs warn messages then OSD begin to crash 1 after all 'domino style'.

This morning, only 2 OSD of 8 are left.

1 of the physical machine was in kernel oops state - Nothing was remote logged, don't know what happened, there were no clear stack message. I suspect btrfs , but I have no proof.

This node (OSD.7) seems to have been the 1st one to crash, generated reconstruction between OSD & then lead to the cascade osd crash.

The other physical machines are still up, but with no osd running. here are some trace found in osd log :

-3> 2012-06-03 12:43:32.524671 7ff1352b8700 0 log [WRN] : slow request 30.506952 seconds old, rec eived at 2012-06-03 12:43:01.997386: osd_sub_op(osd.0.0:1842628 2.57 ea8d5657/label5_17606_object7068/
head [push] v 191'628 snapset=0=[]:[] snapc=0=[]) v6 currently queued for pg
-2> 2012-06-03 12:44:32.869852 7ff1352b8700 0 log [WRN] : 1 slow requests, 1 included below; olde
st blocked for > 30.073136 secs
-1> 2012-06-03 12:44:32.869886 7ff1352b8700 0 log [WRN] : slow request 30.073136 seconds old, rec eived at 2012-06-03 12:44:02.796651: osd_sub_op(osd.6.0:1837430 2.59 97e62059/rb.0.1.0000000a2cdf/head
 [push] v 1438'9416 snapset=0=[]:[] snapc=0=[]) v6 currently started
0> 2012-06-03 12:55:33.088034 7ff1237f6700 -1 *** Caught signal (Aborted) **
 in thread 7ff1237f6700

 ceph version 0.47.2 (commit:8bf9fde89bd6ebc4b0645b2fe02dadb1c17ad372)
 1: /usr/bin/ceph-osd() [0x708ea9]
 2: (()+0xeff0) [0x7ff13af2cff0]
 3: (gsignal()+0x35) [0x7ff13950b1b5]
 4: (abort()+0x180) [0x7ff13950dfc0]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x115) [0x7ff139d9fdc5]
 6: (()+0xcb166) [0x7ff139d9e166]
 7: (()+0xcb193) [0x7ff139d9e193]
 8: (()+0xcb28e) [0x7ff139d9e28e]
 9: (std::__throw_length_error(char const*)+0x67) [0x7ff139d39307]
10: (std::string::_Rep::_S_create(unsigned long, unsigned long, std::allocator<char> const&)+0x72) [0x7ff139d7ab42]
 11: (()+0xa8565) [0x7ff139d7b565]
12: (std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string(char const*, unsigned long, std::allocator<char> const&)+0x1b) [0x7ff139d7b7ab] 13: (leveldb::InternalKeyComparator::FindShortestSeparator(std::string*, leveldb::Slice const&) const+0x4d) [0x6ef69d] 14: (leveldb::TableBuilder::Add(leveldb::Slice const&, leveldb::Slice const&)+0x9f) [0x6fdd9f] 15: (leveldb::DBImpl::DoCompactionWork(leveldb::DBImpl::CompactionState*)+0x4d3) [0x6eaba3]
 16: (leveldb::DBImpl::BackgroundCompaction()+0x222) [0x6ebb02]
 17: (leveldb::DBImpl::BackgroundCall()+0x68) [0x6ec378]
 18: /usr/bin/ceph-osd() [0x704981]
 19: (()+0x68ca) [0x7ff13af248ca]
 20: (clone()+0x6d) [0x7ff1395a892d]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

2 OSD exhibit similar traces.

---

4 other had traces like this one :

-5> 2012-06-03 13:31:39.393489 7f74fd9c7700 -1 osd.3 1513 heartbeat_check: no reply from osd.5 sin
ce 2012-06-03 13:31:18.459792 (cutoff 2012-06-03 13:31:19.393488)
-4> 2012-06-03 13:31:40.393689 7f74fd9c7700 -1 osd.3 1513 heartbeat_check: no reply from osd.5 sin
ce 2012-06-03 13:31:18.459792 (cutoff 2012-06-03 13:31:20.393687)
-3> 2012-06-03 13:31:41.402873 7f74fd9c7700 -1 osd.3 1513 heartbeat_check: no reply from osd.5 sin
ce 2012-06-03 13:31:18.459792 (cutoff 2012-06-03 13:31:21.402872)
-2> 2012-06-03 13:31:42.363270 7f74f08ac700 -1 osd.3 1513 heartbeat_check: no reply from osd.5 sin
ce 2012-06-03 13:31:18.459792 (cutoff 2012-06-03 13:31:22.363269)
-1> 2012-06-03 13:31:42.416968 7f74fd9c7700 -1 osd.3 1513 heartbeat_check: no reply from osd.5 sin
ce 2012-06-03 13:31:18.459792 (cutoff 2012-06-03 13:31:22.416966)
0> 2012-06-03 13:36:48.147020 7f74f58b6700 -1 osd/PG.cc: In function 'void PG::merge_log(ObjectStore::Transaction&, pg_info_t&, pg_log_t&, int)' thread 7f74f58b6700 time 2012-06-03 13:36:48.100157 osd/PG.cc: 402: FAILED assert(log.head >= olog.tail && olog.head >= log.tail)

 ceph version 0.47.2 (commit:8bf9fde89bd6ebc4b0645b2fe02dadb1c17ad372)
1: (PG::merge_log(ObjectStore::Transaction&, pg_info_t&, pg_log_t&, int)+0x1eae) [0x649cce] 2: (PG::RecoveryState::Stray::react(PG::RecoveryState::MLogRec const&)+0x2b1) [0x649fc1] 3: (boost::statechart::simple_state<PG::RecoveryState::Stray, PG::RecoveryState::Started, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::react_impl(boost::statechart::event_base const&, void const*)+0x203) [0x660343] 4: (boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine, PG::RecoveryState::Initial, std::allocator<void>, boost::statechart::null_exception_translator>::process_event(boost::statechart::event_base const&)+0x6b) [0x6580eb] 5: (PG::RecoveryState::handle_log(int, MOSDPGLog*, PG::RecoveryCtx*)+0x190) [0x6139d0]
 6: (OSD::handle_pg_log(std::tr1::shared_ptr<OpRequest>)+0x666) [0x5cec66]
 7: (OSD::dispatch_op(std::tr1::shared_ptr<OpRequest>)+0x11b) [0x5d312b]
 8: (OSD::_dispatch(Message*)+0x173) [0x5dc273]
 9: (OSD::ms_dispatch(Message*)+0x1e7) [0x5dcba7]
 10: (SimpleMessenger::dispatch_entry()+0x979) [0x7d60a9]
 11: (SimpleMessenger::DispatchThread::entry()+0xd) [0x72781d]
 12: (()+0x68ca) [0x7f75036338ca]
 13: (clone()+0x6d) [0x7f7501cb792d]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- end dump of recent events ---
2012-06-03 13:36:48.487021 7f74f58b6700 -1 *** Caught signal (Aborted) **
 in thread 7f74f58b6700

 ceph version 0.47.2 (commit:8bf9fde89bd6ebc4b0645b2fe02dadb1c17ad372)
 1: /usr/bin/ceph-osd() [0x708ea9]
 2: (()+0xeff0) [0x7f750363bff0]
 3: (gsignal()+0x35) [0x7f7501c1a1b5]
 4: (abort()+0x180) [0x7f7501c1cfc0]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x115) [0x7f75024aedc5]
 6: (()+0xcb166) [0x7f75024ad166]
 7: (()+0xcb193) [0x7f75024ad193]
 8: (()+0xcb28e) [0x7f75024ad28e]
9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x940) [0x77d550] 10: (PG::merge_log(ObjectStore::Transaction&, pg_info_t&, pg_log_t&, int)+0x1eae) [0x649cce] 11: (PG::RecoveryState::Stray::react(PG::RecoveryState::MLogRec const&)+0x2b1) [0x649fc1] 12: (boost::statechart::simple_state<PG::RecoveryState::Stray, PG::RecoveryState::Started, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::react_impl(boost::statechart::event_base const&, void const*)+0x203) [0x660343] 13: (boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine, PG::RecoveryState::Initial, std::allocator<void>, boost::statechart::null_exception_translator>::process_event(boost::statechart::event_base const&)+0x6b) [0x6580eb] 14: (PG::RecoveryState::handle_log(int, MOSDPGLog*, PG::RecoveryCtx*)+0x190) [0x6139d0]
 15: (OSD::handle_pg_log(std::tr1::shared_ptr<OpRequest>)+0x666) [0x5cec66]
 16: (OSD::dispatch_op(std::tr1::shared_ptr<OpRequest>)+0x11b) [0x5d312b]
 17: (OSD::_dispatch(Message*)+0x173) [0x5dc273]
 18: (OSD::ms_dispatch(Message*)+0x1e7) [0x5dcba7]
 19: (SimpleMessenger::dispatch_entry()+0x979) [0x7d60a9]
 20: (SimpleMessenger::DispatchThread::entry()+0xd) [0x72781d]
 21: (()+0x68ca) [0x7f75036338ca]
 22: (clone()+0x6d) [0x7f7501cb792d]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

The root cause can be btrfs... or maybe not. I don't see any btrfs crash oops, just :

Jun 2 23:40:03 chichibu.u14.univ-nantes.prive kernel: [200652.479173] Pid: 16875, comm: kworker/7:0 Tainted: G W 3.4.0-dsiun-120521 #108 Jun 2 23:40:03 chichibu.u14.univ-nantes.prive kernel: [200652.479218] Call Trace: Jun 2 23:40:03 chichibu.u14.univ-nantes.prive kernel: [200652.479243] [<ffffffff81039f1b>] ? warn_slowpath_common+0x7b/0xc0 Jun 2 23:40:03 chichibu.u14.univ-nantes.prive kernel: [200652.479278] [<ffffffffa026fca5>] ? btrfs_orphan_commit_root+0x105/0x110 [btrfs] Jun 2 23:40:03 chichibu.u14.univ-nantes.prive kernel: [200652.479328] [<ffffffffa026965a>] ? commit_fs_roots.isra.22+0xaa/0x170 [btrfs] Jun 2 23:40:03 chichibu.u14.univ-nantes.prive kernel: [200652.479379] [<ffffffffa02bc9a0>] ? btrfs_scrub_pause+0xf0/0x100 [btrfs] Jun 2 23:40:03 chichibu.u14.univ-nantes.prive kernel: [200652.479415] [<ffffffffa026a6f1>] ? btrfs_commit_transaction+0x521/0x9d0 [btrfs] Jun 2 23:40:03 chichibu.u14.univ-nantes.prive kernel: [200652.479460] [<ffffffff8105a9f0>] ? add_wait_queue+0x60/0x60 Jun 2 23:40:03 chichibu.u14.univ-nantes.prive kernel: [200652.479493] [<ffffffffa026aba0>] ? btrfs_commit_transaction+0x9d0/0x9d0 [btrfs] Jun 2 23:40:03 chichibu.u14.univ-nantes.prive kernel: [200652.479543] [<ffffffffa026abb1>] ? do_async_commit+0x11/0x20 [btrfs] Jun 2 23:40:03 chichibu.u14.univ-nantes.prive kernel: [200652.479572] [<ffffffff810548a7>] ? process_one_work+0x107/0x460 Jun 2 23:40:03 chichibu.u14.univ-nantes.prive kernel: [200652.479601] [<ffffffff81055a8e>] ? worker_thread+0x14e/0x330 Jun 2 23:40:03 chichibu.u14.univ-nantes.prive kernel: [200652.479628] [<ffffffff81055940>] ? manage_workers.isra.28+0x210/0x210 Jun 2 23:40:03 chichibu.u14.univ-nantes.prive kernel: [200652.479657] [<ffffffff8105a005>] ? kthread+0x85/0x90 Jun 2 23:40:03 chichibu.u14.univ-nantes.prive kernel: [200652.479684] [<ffffffff813be3e4>] ? kernel_thread_helper+0x4/0x10 Jun 2 23:40:03 chichibu.u14.univ-nantes.prive kernel: [200652.479712] [<ffffffff81059f80>] ? kthread_freezable_should_stop+0x60/0x60 Jun 2 23:40:03 chichibu.u14.univ-nantes.prive kernel: [200652.479741] [<ffffffff813be3e0>] ? gs_change+0x13/0x13 Jun 2 23:40:03 chichibu.u14.univ-nantes.prive kernel: [200652.479767] ---[ end trace 303c47aab4b5d025 ]--- Jun 3 00:44:11 chichibu.u14.univ-nantes.prive kernel: [204497.711101] ------------[ cut here ]------------

But this is just a warn (maybe that could lead to kernel oops/crash). Seems to have been fixed lately in git kernels.


I can give you all 8 logs of OSD + logs of MDS & MON if it can help.


Cheers,
--

Yann Dupont - Service IRTS, DSI Université de Nantes
Tel : 02.53.48.49.20 - Mail/Jabber : Yann.Dupont@xxxxxxxxxxxxxx

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux