Re: still crashing osds with next branch

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Just a quick update: there were some problems with doing a rolling 
upgrade that may be responsible for these.  We're testing the fix now.

Did this, by chance, happen on a cluster with a mix of 0.47.2 and 0.48?

sage


On Wed, 20 Jun 2012, Stefan Priebe wrote:

> Nobody an idea? Should i open up bugs in tracker?
> 
> Am 20.06.2012 15:30, schrieb Stefan Priebe - Profihost AG:
> > Mhm always the same osd's are crashing now again. Mostly while shutting
> > down or restarting a KVM machine.
> > 
> > This time:
> > ####### Server 1 ########################
> >       0> 2012-06-20 11:59:06.499813 7f1664052700 -1 *** Caught signal
> > (Segmentation fault) **
> >   in thread 7f1664052700
> > 
> >   ceph version 0.47.2-521-g88c7629
> > (commit:88c7629e041699c25a7c91114bd1ac4ffc64c3eb)
> >   1: /usr/bin/ceph-osd() [0x70e429]
> >   2: (()+0xeff0) [0x7f16714d5ff0]
> >   3: (OSD::disconnect_session_watches(OSD::Session*)+0x418) [0x5c80b8]
> >   4: (OSD::ms_handle_reset(Connection*)+0x13b) [0x5c88db]
> >   5: (SimpleMessenger::dispatch_entry()+0x1145) [0x72ca85]
> >   6: (SimpleMessenger::DispatchThread::entry()+0xd) [0x719dad]
> >   7: (()+0x68ca) [0x7f16714cd8ca]
> >   8: (clone()+0x6d) [0x7f166fb51c0d]
> >   NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> > needed to interpret this.
> > 
> > --- end dump of recent events ---
> > 
> > 
> > And the
> > ####### Server 2 ########################
> > 
> >   thread 7ff933ef4700 time 2012-06-20 15:20:12.450641
> > osd/ReplicatedPG.cc: 968: FAILED assert(obc->registered)
> > 
> >   ceph version 0.47.2-521-g88c7629
> > (commit:88c7629e041699c25a7c91114bd1ac4ffc64c3eb)
> >   1: (ReplicatedPG::do_op(std::tr1::shared_ptr<OpRequest>)+0x34a0)
> > [0x56c3c0]
> >   2: (PG::do_request(std::tr1::shared_ptr<OpRequest>)+0x1af) [0x61e8cf]
> >   3: (OSD::dequeue_op(PG*)+0x39a) [0x5b43ca]
> >   4: (ThreadPool::worker()+0xb38) [0x7bbf78]
> >   5: (ThreadPool::WorkThread::entry()+0xd) [0x5f1dad]
> >   6: (()+0x68ca) [0x7ff9444768ca]
> >   7: (clone()+0x6d) [0x7ff942afac0d]
> >   NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> > needed to interpret this.
> > 
> >       0> 2012-06-20 15:20:12.466828 7ff939800700 -1 ./common/Mutex.h: In
> > function 'void Mutex::Lock(bool)' thread 7ff939800700 time 2
> > 012-06-20 15:20:12.466152
> > ./common/Mutex.h: 110: FAILED assert(r == 0)
> > 
> >   ceph version 0.47.2-521-g88c7629
> > (commit:88c7629e041699c25a7c91114bd1ac4ffc64c3eb)
> >   1: /usr/bin/ceph-osd() [0x51a05d]
> >   2: (ReplicatedPG::C_OSD_OndiskWriteUnlock::finish(int)+0x2a) [0x579c3a]
> >   3: (FileStore::_finish_op(FileStore::OpSequencer*)+0x2dc) [0x68422c]
> >   4: (ThreadPool::worker()+0xbb7) [0x7bbff7]
> >   5: (ThreadPool::WorkThread::entry()+0xd) [0x5f1dad]
> >   6: (()+0x68ca) [0x7ff9444768ca]
> >   7: (clone()+0x6d) [0x7ff942afac0d]
> >   NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> > needed to interpret this.
> > 
> > --- end dump of recent events ---
> > 2012-06-20 15:20:12.511987 7ff933ef4700 -1 *** Caught signal (Aborted) **
> >   in thread 7ff933ef4700
> > 
> >   ceph version 0.47.2-521-g88c7629
> > (commit:88c7629e041699c25a7c91114bd1ac4ffc64c3eb)
> >   1: /usr/bin/ceph-osd() [0x70e429]
> >   2: (()+0xeff0) [0x7ff94447eff0]
> >   3: (gsignal()+0x35) [0x7ff942a5d225]
> >   4: (abort()+0x180) [0x7ff942a60030]
> >   5: (__gnu_cxx::__verbose_terminate_handler()+0x115) [0x7ff9432f1dc5]
> >   6: (()+0xcb166) [0x7ff9432f0166]
> >   7: (()+0xcb193) [0x7ff9432f0193]
> >   8: (()+0xcb28e) [0x7ff9432f028e]
> >   9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> > const*)+0x940) [0x78ae90]
> >   10: (ReplicatedPG::do_op(std::tr1::shared_ptr<OpRequest>)+0x34a0)
> > [0x56c3c0]
> >   11: (PG::do_request(std::tr1::shared_ptr<OpRequest>)+0x1af) [0x61e8cf]
> >   12: (OSD::dequeue_op(PG*)+0x39a) [0x5b43ca]
> >   13: (ThreadPool::worker()+0xb38) [0x7bbf78]
> >   14: (ThreadPool::WorkThread::entry()+0xd) [0x5f1dad]
> >   15: (()+0x68ca) [0x7ff9444768ca]
> >   16: (clone()+0x6d) [0x7ff942afac0d]
> >   NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> > needed to interpret this.
> > 
> > --- begin dump of recent events ---
> >       0> 2012-06-20 15:20:12.511987 7ff933ef4700 -1 *** Caught signal
> > (Aborted) **
> >   in thread 7ff933ef4700
> > 
> >   ceph version 0.47.2-521-g88c7629
> > (commit:88c7629e041699c25a7c91114bd1ac4ffc64c3eb)
> >   1: /usr/bin/ceph-osd() [0x70e429]
> >   2: (()+0xeff0) [0x7ff94447eff0]
> >   3: (gsignal()+0x35) [0x7ff942a5d225]
> >   4: (abort()+0x180) [0x7ff942a60030]
> >   5: (__gnu_cxx::__verbose_terminate_handler()+0x115) [0x7ff9432f1dc5]
> >   6: (()+0xcb166) [0x7ff9432f0166]
> >   7: (()+0xcb193) [0x7ff9432f0193]
> >   8: (()+0xcb28e) [0x7ff9432f028e]
> >   9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> > const*)+0x940) [0x78ae90]
> >   10: (ReplicatedPG::do_op(std::tr1::shared_ptr<OpRequest>)+0x34a0)
> > [0x56c3c0]
> >   11: (PG::do_request(std::tr1::shared_ptr<OpRequest>)+0x1af) [0x61e8cf]
> >   12: (OSD::dequeue_op(PG*)+0x39a) [0x5b43ca]
> >   13: (ThreadPool::worker()+0xb38) [0x7bbf78]
> >   14: (ThreadPool::WorkThread::entry()+0xd) [0x5f1dad]
> >   15: (()+0x68ca) [0x7ff9444768ca]
> >   16: (clone()+0x6d) [0x7ff942afac0d]
> >   NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> > needed to interpret this.
> > 
> > --- end dump of recent events ---
> > 
> > 
> > Am 20.06.2012 12:03, schrieb Stefan Priebe - Profihost AG:
> > > Hello list,
> > > 
> > > i'm still seeing osd crashes with next branch under KVM load. If you
> > > need the core dump please tell me.
> > > 
> > > Here are TWO different crashes.
> > > 
> > > Here are the last log lines:
> > > 
> > > ########### CRASH 1 ###########
> > > 
> > > -3> 2012-06-20 11:59:06.446836 7f1660f4b700 0 osd.13 105 pg[4.64b( v
> > > 105'29708 (103'28588,105'29708] n=25 ec=56 les/c 105/105 104/104/104)
> > > [13] r=0 lpr=104 mlcod 105'29708 active+degraded] watch:
> > > oi.user_version=28492
> > > -2> 2012-06-20 11:59:06.496350 7f166074a700 0 osd.13 105 pg[4.64b( v
> > > 105'29709 (103'28588,105'29709] n=25 ec=56 les/c 105/105 104/104/104)
> > > [13] r=0 lpr=104 mlcod 105'29709 active+degraded] watch:
> > > ctx->obc=0x9f94840 cookie=1 oi.version=29709 ctx->at_version=105'29710
> > > -1> 2012-06-20 11:59:06.496386 7f166074a700 0 osd.13 105 pg[4.64b( v
> > > 105'29709 (103'28588,105'29709] n=25 ec=56 les/c 105/105 104/104/104)
> > > [13] r=0 lpr=104 mlcod 105'29709 active+degraded] watch:
> > > oi.user_version=28492
> > > 0> 2012-06-20 11:59:06.499813 7f1664052700 -1 *** Caught signal
> > > (Segmentation fault) **
> > > in thread 7f1664052700
> > > 
> > > ceph version 0.47.2-521-g88c7629
> > > (commit:88c7629e041699c25a7c91114bd1ac4ffc64c3eb)
> > > 1: /usr/bin/ceph-osd() [0x70e429]
> > > 2: (()+0xeff0) [0x7f16714d5ff0]
> > > 3: (OSD::disconnect_session_watches(OSD::Session*)+0x418) [0x5c80b8]
> > > 4: (OSD::ms_handle_reset(Connection*)+0x13b) [0x5c88db]
> > > 5: (SimpleMessenger::dispatch_entry()+0x1145) [0x72ca85]
> > > 6: (SimpleMessenger::DispatchThread::entry()+0xd) [0x719dad]
> > > 7: (()+0x68ca) [0x7f16714cd8ca]
> > > 8: (clone()+0x6d) [0x7f166fb51c0d]
> > > NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed
> > > to interpret this.
> > > 
> > > --- end dump of recent events ---
> > > 
> > > 
> > > ########### CRASH 2 ###########
> > > 
> > > 0> 2012-06-20 11:56:46.339027 7f39d5c0a700 -1 ./common/Mutex.h: In
> > > function 'void Mutex::Lock(bool)' thread 7f39d5c0a700 time 2012-06-20
> > > 11:56:46.338403
> > > ./common/Mutex.h: 110: FAILED assert(r == 0)
> > > 
> > > ceph version 0.47.2-521-g88c7629
> > > (commit:88c7629e041699c25a7c91114bd1ac4ffc64c3eb)
> > > 1: /usr/bin/ceph-osd() [0x51a05d]
> > > 2: (ReplicatedPG::C_OSD_OndiskWriteUnlock::finish(int)+0x2a) [0x579c3a]
> > > 3: (FileStore::_finish_op(FileStore::OpSequencer*)+0x2dc) [0x68422c]
> > > 4: (ThreadPool::worker()+0xbb7) [0x7bbff7]
> > > 5: (ThreadPool::WorkThread::entry()+0xd) [0x5f1dad]
> > > 6: (()+0x68ca) [0x7f39e10818ca]
> > > 7: (clone()+0x6d) [0x7f39df705c0d]
> > > NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed
> > > to interpret this.
> > > 
> > > --- end dump of recent events ---
> > > 2012-06-20 11:56:46.355013 7f39d5c0a700 -1 *** Caught signal (Aborted) **
> > > in thread 7f39d5c0a700
> > > 
> > > ceph version 0.47.2-521-g88c7629
> > > (commit:88c7629e041699c25a7c91114bd1ac4ffc64c3eb)
> > > 1: /usr/bin/ceph-osd() [0x70e429]
> > > 2: (()+0xeff0) [0x7f39e1089ff0]
> > > 3: (gsignal()+0x35) [0x7f39df668225]
> > > 4: (abort()+0x180) [0x7f39df66b030]
> > > 5: (__gnu_cxx::__verbose_terminate_handler()+0x115) [0x7f39dfefcdc5]
> > > 6: (()+0xcb166) [0x7f39dfefb166]
> > > 7: (()+0xcb193) [0x7f39dfefb193]
> > > 8: (()+0xcb28e) [0x7f39dfefb28e]
> > > 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> > > const*)+0x940) [0x78ae90]
> > > 10: /usr/bin/ceph-osd() [0x51a05d]
> > > 11: (ReplicatedPG::C_OSD_OndiskWriteUnlock::finish(int)+0x2a) [0x579c3a]
> > > 12: (FileStore::_finish_op(FileStore::OpSequencer*)+0x2dc) [0x68422c]
> > > 13: (ThreadPool::worker()+0xbb7) [0x7bbff7]
> > > 14: (ThreadPool::WorkThread::entry()+0xd) [0x5f1dad]
> > > 15: (()+0x68ca) [0x7f39e10818ca]
> > > 16: (clone()+0x6d) [0x7f39df705c0d]
> > > NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed
> > > to interpret this.
> > > 
> > > --- begin dump of recent events ---
> > > 0> 2012-06-20 11:56:46.355013 7f39d5c0a700 -1 *** Caught signal
> > > (Aborted) **
> > > in thread 7f39d5c0a700
> > > 
> > > ceph version 0.47.2-521-g88c7629
> > > (commit:88c7629e041699c25a7c91114bd1ac4ffc64c3eb)
> > > 1: /usr/bin/ceph-osd() [0x70e429]
> > > 2: (()+0xeff0) [0x7f39e1089ff0]
> > > 3: (gsignal()+0x35) [0x7f39df668225]
> > > 4: (abort()+0x180) [0x7f39df66b030]
> > > 5: (__gnu_cxx::__verbose_terminate_handler()+0x115) [0x7f39dfefcdc5]
> > > 6: (()+0xcb166) [0x7f39dfefb166]
> > > 7: (()+0xcb193) [0x7f39dfefb193]
> > > 8: (()+0xcb28e) [0x7f39dfefb28e]
> > > 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> > > const*)+0x940) [0x78ae90]
> > > 10: /usr/bin/ceph-osd() [0x51a05d]
> > > 11: (ReplicatedPG::C_OSD_OndiskWriteUnlock::finish(int)+0x2a) [0x579c3a]
> > > 12: (FileStore::_finish_op(FileStore::OpSequencer*)+0x2dc) [0x68422c]
> > > 13: (ThreadPool::worker()+0xbb7) [0x7bbff7]
> > > 14: (ThreadPool::WorkThread::entry()+0xd) [0x5f1dad]
> > > 15: (()+0x68ca) [0x7f39e10818ca]
> > > 16: (clone()+0x6d) [0x7f39df705c0d]
> > > NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed
> > > to interpret this.
> > > 
> > > --- end dump of recent events ---
> > > 
> > > Stefan
> 
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux