Power Outage

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I don't believe you can re-add an OSD after `ceph osd rm`, but it's worth a
shot.  Let me see what I can do on my dev cluster.

What does `ceph osd dump` and `ceph osd tree` say?  I want to make sure I'm
starting from the same point you are.



On Wed, Jul 16, 2014 at 7:39 PM, hjcho616 <hjcho616 at yahoo.com> wrote:

> I did a "ceph osd rm" for all three but I didn't do anything else to it
> afterwards.  Can this be added back?
>
> Regards,
> Hong
>
>
>   On Wednesday, July 16, 2014 6:54 PM, Craig Lewis <
> clewis at centraldesktop.com> wrote:
>
>
>  For some reason you ended up in my spam folder.  That might be why you
> didn't get any responses.
>
>
> Have you destroyed osd.0, osd.1, and osd.2?  If not, try bringing them up
> one a time.  You might have just one bad disk, which is much better than
> 50% of your disks.
>
> How is the ceph-osd process behaving when it hits the suicide timeout?  I
> had some problems a while back where the ceph-osd process would startup,
> start consuming ~200% CPU for a while, then get stuck using almost exactly
> 100% CPU.  It would get kicked out of the cluster for being unresponsive,
> then suicide.  Repeat.  If that's happening here, I can suggest some things
> to try.
>
>
>
>
>
> On Fri, Jul 11, 2014 at 9:12 PM, hjcho616 <hjcho616 at yahoo.com> wrote:
>
> I have 2 OSD machines with 3 OSD running on each.  One MDS server with 3
> daemons running.  Ran cephfs mostly on 0.78.  One night we lost power for
> split second.  MDS1 and OSD2 went down, OSD1 seemed OK, well turns out OSD1
> suffered most.  Those two machines rebooted and seemed ok except it had
> some inconsistencies.  I waited for a while, didn't fix itself.  So I
> issued 'ceph pg repair pgnum'.  It would try some and some OSD would crash.
>  Tried this for multiple days.  Got some PGs fixed... but mostly it would
> crash an OSD and stop recovering.  dmesg shows something like below.
>
>
>
> [  740.059498] traps: ceph-osd[5279] general protection ip:7f84e75ec75e
> sp:7fff00045bc0 error:0 in libtcmalloc.so.4.1.0[7f84e75b3000+4a000]
>
> and ceph osd log shows something like this.
>
>      -2> 2014-07-09 20:51:01.163571 7fe0f4617700  1 heartbeat_map
> is_healthy 'FileStore::op_tp thread 0x7fe0e8e91700' had timed out after 60
>     -1> 2014-07-09 20:51:01.163609 7fe0f4617700  1 heartbeat_map
> is_healthy 'FileStore::op_tp thread 0x7fe0e8e91700' had suicide timed out
> after 180
>      0> 2014-07-09 20:51:01.169542 7fe0f4617700 -1 common/HeartbeatMap.cc:
> In function 'bool ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*,
> const char*, time_t)' thread 7fe0f4617700 time 2014-07-09 20:51:01.163642
> common/HeartbeatMap.cc: 79: FAILED assert(0 == "hit suicide timeout")
>
>  ceph version 0.82 (14085f42ddd0fef4e7e1dc99402d07a8df82c04e)
>  1: (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, char const*,
> long)+0x2eb) [0xad2cbb]
>  2: (ceph::HeartbeatMap::is_healthy()+0xb6) [0xad34c6]
>  3: (ceph::HeartbeatMap::check_touch_file()+0x28) [0xad3aa8]
>  4: (CephContextServiceThread::entry()+0x13f) [0xb9911f]
>  5: (()+0x8062) [0x7fe0f797e062]
>  6: (clone()+0x6d) [0x7fe0f62bea3d]
>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed
> to interpret this.
>
> --- logging levels ---
>    0/ 5 none
>    0/ 1 lockdep
>    0/ 1 context
>    1/ 1 crush
>    1/ 5 mds
>    1/ 5 mds_balancer
>    1/ 5 mds_locker
>    1/ 5 mds_log
>    1/ 5 mds_log_expire
>    1/ 5 mds_migrator
>    0/ 1 buffer
>    0/ 1 timer
>    0/ 1 filer
>    0/ 1 striper
>    0/ 1 objecter
>    0/ 5 rados
>    0/ 5 rbd
>    0/ 5 journaler
>    0/ 5 objectcacher
>    0/ 5 client
>    0/ 5 osd
>    0/ 5 optracker
>    0/ 5 objclass
>    1/ 3 filestore
>    1/ 3 keyvaluestore
>    1/ 3 journal
>    0/ 5 ms
>    1/ 5 mon
>    0/10 monc
>    1/ 5 paxos
>    0/ 5 tp
>    1/ 5 auth
>    1/ 5 crypto
>    1/ 1 finisher
>    1/ 5 heartbeatmap
>    1/ 5 perfcounter
>    1/ 5 rgw
>    1/ 5 javaclient
>    1/ 5 asok
>    1/ 1 throttle
>   -2/-2 (syslog threshold)
>   -1/-1 (stderr threshold)
>   max_recent     10000
>   max_new         1000
>   log_file /var/log/ceph/ceph-osd.0.log
> --- end dump of recent events ---
> 2014-07-09 20:51:01.534706 7fe0f4617700 -1 *** Caught signal (Aborted) **
>  in thread 7fe0f4617700
>
>
>  ceph version 0.82 (14085f42ddd0fef4e7e1dc99402d07a8df82c04e)
>  1: /usr/bin/ceph-osd() [0xaac562]
>  2: (()+0xf880) [0x7fe0f7985880]
>  3: (gsignal()+0x39) [0x7fe0f620e3a9]
>  4: (abort()+0x148) [0x7fe0f62114c8]
>  5: (__gnu_cxx::__verbose_terminate_handler()+0x155) [0x7fe0f6afb5e5]
>  6: (()+0x5e746) [0x7fe0f6af9746]
>  7: (()+0x5e773) [0x7fe0f6af9773]
>  8: (()+0x5e9b2) [0x7fe0f6af99b2]
>  9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x40a) [0xb85b6a]
>  10: (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, char const*,
> long)+0x2eb) [0xad2cbb]
>  11: (ceph::HeartbeatMap::is_healthy()+0xb6) [0xad34c6]
>  12: (ceph::HeartbeatMap::check_touch_file()+0x28) [0xad3aa8]
>  13: (CephContextServiceThread::entry()+0x13f) [0xb9911f]
>  14: (()+0x8062) [0x7fe0f797e062]
>  15: (clone()+0x6d) [0x7fe0f62bea3d]
>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed
> to interpret this.
>
> --- begin dump of recent events ---
>      0> 2014-07-09 20:51:01.534706 7fe0f4617700 -1 *** Caught signal
> (Aborted) **
>  in thread 7fe0f4617700
>
>  ceph version 0.82 (14085f42ddd0fef4e7e1dc99402d07a8df82c04e)
>  1: /usr/bin/ceph-osd() [0xaac562]
>  2: (()+0xf880) [0x7fe0f7985880]
>  3: (gsignal()+0x39) [0x7fe0f620e3a9]
>  4: (abort()+0x148) [0x7fe0f62114c8]
>  5: (__gnu_cxx::__verbose_terminate_handler()+0x155) [0x7fe0f6afb5e5]
>  6: (()+0x5e746) [0x7fe0f6af9746]
>  7: (()+0x5e773) [0x7fe0f6af9773]
>  8: (()+0x5e9b2) [0x7fe0f6af99b2]
>  9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x40a) [0xb85b6a]
>  10: (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, char const*,
> long)+0x2eb) [0xad2cbb]
>  11: (ceph::HeartbeatMap::is_healthy()+0xb6) [0xad34c6]
>  12: (ceph::HeartbeatMap::check_touch_file()+0x28) [0xad3aa8]
>  13: (CephContextServiceThread::entry()+0x13f) [0xb9911f]
>  14: (()+0x8062) [0x7fe0f797e062]
>  15: (clone()+0x6d) [0x7fe0f62bea3d]
>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed
> to interpret this.
>
> --- logging levels ---
>    0/ 5 none
>    0/ 1 lockdep
>    0/ 1 context
>    1/ 1 crush
>    1/ 5 mds
>    1/ 5 mds_balancer
>    1/ 5 mds_locker
>    1/ 5 mds_log
>    1/ 5 mds_log_expire
>    1/ 5 mds_migrator
>    0/ 1 buffer
>    0/ 1 timer
>    0/ 1 filer
>    0/ 1 striper
>    0/ 1 objecter
>    0/ 5 rados
>    0/ 5 rbd
>    0/ 5 journaler
>    0/ 5 objectcacher
>    0/ 5 client
>    0/ 5 osd
>    0/ 5 optracker
>    0/ 5 objclass
>    1/ 3 filestore
>    1/ 3 keyvaluestore
>    1/ 3 journal
>    0/ 5 ms
>    1/ 5 mon
>    0/10 monc
>    1/ 5 paxos
>    0/ 5 tp
>    1/ 5 auth
>    1/ 5 crypto
>    1/ 1 finisher
>    1/ 5 heartbeatmap
>    1/ 5 perfcounter
>    1/ 5 rgw
>    1/ 5 javaclient
>    1/ 5 asok
>    1/ 1 throttle
>   -2/-2 (syslog threshold)
>   -1/-1 (stderr threshold)
>   max_recent     10000
>   max_new         1000
>   log_file /var/log/ceph/ceph-osd.0.log
> --- end dump of recent events ---
>
> After several attempts at it, osd.2 (which was on OSD1 which survived the
> power event) never comes up.  Looks like journal was corrupted
>
>     -1> 2014-07-09 20:44:14.992840 7f12256b67c0 -1 journal Unable to read
> past sequence 2157634 but header indicates the journal has committed up
> through 2157670, journal is corrupt
>      0> 2014-07-09 20:44:14.998742 7f12256b67c0 -1 os/FileJournal.cc: In
> function 'bool FileJournal::read_entry(ceph::bufferlist&, uint64_t&,
> bool*)' thread 7f12256b67c0 time 2014-07-09 20:44:14.993082
> os/FileJournal.cc: 1677: FAILED assert(0)
>
>  ceph version 0.82 (14085f42ddd0fef4e7e1dc99402d07a8df82c04e)
>  1: (FileJournal::read_entry(ceph::buffer::list&, unsigned long&,
> bool*)+0x467) [0xa8d497]
>  2: (JournalingObjectStore::journal_replay(unsigned long)+0x22e) [0x9dfebe]
>  3: (FileStore::mount()+0x32c9) [0x9b7939]
>  4: (OSD::do_convertfs(ObjectStore*)+0x1a) [0x78d8fa]
>  5: (main()+0x2237) [0x730837]
>  6: (__libc_start_main()+0xf5) [0x7f12236bdb45]
>  7: /usr/bin/ceph-osd() [0x734479]
>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed
> to interpret this.
>
> --- logging levels ---
>    0/ 5 none
>    0/ 1 lockdep
>    0/ 1 context
>    1/ 1 crush
>    1/ 5 mds
>    1/ 5 mds_balancer
>    1/ 5 mds_locker
>    1/ 5 mds_log
>    1/ 5 mds_log_expire
>    1/ 5 mds_migrator
>    0/ 1 buffer
>    0/ 1 timer
>    0/ 1 filer
>    0/ 1 striper
>    0/ 1 objecter
>    0/ 5 rados
>    0/ 5 rbd
>    0/ 5 journaler
>    0/ 5 objectcacher
>    0/ 5 client
>    0/ 5 osd
>    0/ 5 optracker
>    0/ 5 objclass
>    1/ 3 filestore
>    1/ 3 keyvaluestore
>    1/ 3 journal
>    0/ 5 ms
>    1/ 5 mon
>    0/10 monc
>    1/ 5 paxos
>    0/ 5 tp
>    1/ 5 auth
>    1/ 5 crypto
>    1/ 1 finisher
>    1/ 5 heartbeatmap
>    1/ 5 perfcounter
>    1/ 5 rgw
>    1/ 5 javaclient
>    1/ 5 asok
>    1/ 1 throttle
>   -2/-2 (syslog threshold)
>   -1/-1 (stderr threshold)
>   max_recent     10000
>   max_new         1000
>   log_file /var/log/ceph/ceph-osd.2.log
> --- end dump of recent events ---
> 2014-07-09 20:44:15.010090 7f12256b67c0 -1 *** Caught signal (Aborted) **
>  in thread 7f12256b67c0
> ceph version 0.82 (14085f42ddd0fef4e7e1dc99402d07a8df82c04e)
>  1: /usr/bin/ceph-osd() [0xaac562]
>  2: (()+0xf880) [0x7f1224e48880]
>  3: (gsignal()+0x39) [0x7f12236d13a9]
>  4: (abort()+0x148) [0x7f12236d44c8]
>  5: (__gnu_cxx::__verbose_terminate_handler()+0x155) [0x7f1223fbe5e5]
>  6: (()+0x5e746) [0x7f1223fbc746]
>  7: (()+0x5e773) [0x7f1223fbc773]
>  8: (()+0x5e9b2) [0x7f1223fbc9b2]
>  9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x40a) [0xb85b6a]
>  10: (FileJournal::read_entry(ceph::buffer::list&, unsigned long&,
> bool*)+0x467) [0xa8d497]
>  11: (JournalingObjectStore::journal_replay(unsigned long)+0x22e)
> [0x9dfebe]
>  12: (FileStore::mount()+0x32c9) [0x9b7939]
>  13: (OSD::do_convertfs(ObjectStore*)+0x1a) [0x78d8fa]
>  14: (main()+0x2237) [0x730837]
>  15: (__libc_start_main()+0xf5) [0x7f12236bdb45]
>   16: /usr/bin/ceph-osd() [0x734479]
>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed
> to interpret this.
>
> --- begin dump of recent events ---
>      0> 2014-07-09 20:44:15.010090 7f12256b67c0 -1 *** Caught signal
> (Aborted) **
>  in thread 7f12256b67c0
>
>  ceph version 0.82 (14085f42ddd0fef4e7e1dc99402d07a8df82c04e)
>  1: /usr/bin/ceph-osd() [0xaac562]
>  2: (()+0xf880) [0x7f1224e48880]
>  3: (gsignal()+0x39) [0x7f12236d13a9]
>  4: (abort()+0x148) [0x7f12236d44c8]
>  5: (__gnu_cxx::__verbose_terminate_handler()+0x155) [0x7f1223fbe5e5]
>  6: (()+0x5e746) [0x7f1223fbc746]
>  7: (()+0x5e773) [0x7f1223fbc773]
>  8: (()+0x5e9b2) [0x7f1223fbc9b2]
>  9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x40a) [0xb85b6a]
>  10: (FileJournal::read_entry(ceph::buffer::list&, unsigned long&,
> bool*)+0x467) [0xa8d497]
>  11: (JournalingObjectStore::journal_replay(unsigned long)+0x22e)
> [0x9dfebe]
>  12: (FileStore::mount()+0x32c9) [0x9b7939]
>  13: (OSD::do_convertfs(ObjectStore*)+0x1a) [0x78d8fa]
>  14: (main()+0x2237) [0x730837]
>  15: (__libc_start_main()+0xf5) [0x7f12236bdb45]
>  16: /usr/bin/ceph-osd() [0x734479]
>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed
> to interpret this.
>
>  --- logging levels ---
>    0/ 5 none
>    0/ 1 lockdep
>    0/ 1 context
>    1/ 1 crush
>    1/ 5 mds
>    1/ 5 mds_balancer
>    1/ 5 mds_locker
>    1/ 5 mds_log
>    1/ 5 mds_log_expire
>    1/ 5 mds_migrator
>    0/ 1 buffer
>    0/ 1 timer
>    0/ 1 filer
>    0/ 1 striper
>    0/ 1 objecter
>    0/ 5 rados
>    0/ 5 rbd
>    0/ 5 journaler
>    0/ 5 objectcacher
>    0/ 5 client
>    0/ 5 osd
>    0/ 5 optracker
>    0/ 5 objclass
>    1/ 3 filestore
>    1/ 3 keyvaluestore
>    1/ 3 journal
>    0/ 5 ms
>    1/ 5 mon
>    0/10 monc
>    1/ 5 paxos
>    0/ 5 tp
>    1/ 5 auth
>    1/ 5 crypto
>    1/ 1 finisher
>    1/ 5 heartbeatmap
>    1/ 5 perfcounter
>    1/ 5 rgw
>    1/ 5 javaclient
>    1/ 5 asok
>    1/ 1 throttle
>   -2/-2 (syslog threshold)
>   -1/-1 (stderr threshold)
>   max_recent     10000
>   max_new         1000
>   log_file /var/log/ceph/ceph-osd.2.log
> --- end dump of recent events ---
>
>
> So I thought maybe upgrading 0.82 would give it a better option at fixing
> things... so I did, now not only those OSDs fail (osd.1 is up but with 14M
> of memory only... I assume that's broky too), but MDS fails too.
>
> # /usr/bin/ceph-mds -i MDS1 --pid-file /var/run/ceph/mds.MDS1.pid -c
> /etc/ceph/ceph.conf --cluster ceph -f --debug-mds=20 --debug-journaler=10
> starting mds.MDS1 at :/0
> mds/MDLog.cc: In function 'void MDLog::_replay_thread()' thread
> 7f8e07c21700 time 2014-07-09 21:01:10.190965
> mds/MDLog.cc: 815: FAILED assert(journaler->is_readable())
>  ceph version 0.82 (14085f42ddd0fef4e7e1dc99402d07a8df82c04e)
>  1: (MDLog::_replay_thread()+0x197b) [0x85a3cb]
>  2: (MDLog::ReplayThread::entry()+0xd) [0x66466d]
>  3: (()+0x8062) [0x7f8e0fe3f062]
>  4: (clone()+0x6d) [0x7f8e0ebd3a3d]
>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed
> to interpret this.
> 2014-07-09 21:01:10.192936 7f8e07c21700 -1 mds/MDLog.cc: In function 'void
> MDLog::_replay_thread()' thread 7f8e07c21700 time 2014-07-09 21:01:10.190965
> mds/MDLog.cc: 815: FAILED assert(journaler->is_readable())
>
>  ceph version 0.82 (14085f42ddd0fef4e7e1dc99402d07a8df82c04e)
>  1: (MDLog::_replay_thread()+0x197b) [0x85a3cb]
>  2: (MDLog::ReplayThread::entry()+0xd) [0x66466d]
>  3: (()+0x8062) [0x7f8e0fe3f062]
>  4: (clone()+0x6d) [0x7f8e0ebd3a3d]
>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed
> to interpret this.
>
>      0> 2014-07-09 21:01:10.192936 7f8e07c21700 -1 mds/MDLog.cc: In
> function 'void MDLog::_replay_thread()' thread 7f8e07c21700 time 2014-07-09
> 21:01:10.190965
> mds/MDLog.cc: 815: FAILED assert(journaler->is_readable())
>
>  ceph version 0.82 (14085f42ddd0fef4e7e1dc99402d07a8df82c04e)
>  1: (MDLog::_replay_thread()+0x197b) [0x85a3cb]
>  2: (MDLog::ReplayThread::entry()+0xd) [0x66466d]
>  3: (()+0x8062) [0x7f8e0fe3f062]
>  4: (clone()+0x6d) [0x7f8e0ebd3a3d]
>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed
> to interpret this.
>
> terminate called after throwing an instance of 'ceph::FailedAssertion'
> *** Caught signal (Aborted) **
>  in thread 7f8e07c21700
>  ceph version 0.82 (14085f42ddd0fef4e7e1dc99402d07a8df82c04e)
>  1: /usr/bin/ceph-mds() [0x8d81f2]
>  2: (()+0xf880) [0x7f8e0fe46880]
>  3: (gsignal()+0x39) [0x7f8e0eb233a9]
>  4: (abort()+0x148) [0x7f8e0eb264c8]
>  5: (__gnu_cxx::__verbose_terminate_handler()+0x155) [0x7f8e0f4105e5]
>  6: (()+0x5e746) [0x7f8e0f40e746]
>  7: (()+0x5e773) [0x7f8e0f40e773]
>  8: (()+0x5e9b2) [0x7f8e0f40e9b2]
>  9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x40a) [0x9ab5da]
>  10: (MDLog::_replay_thread()+0x197b) [0x85a3cb]
>  11: (MDLog::ReplayThread::entry()+0xd) [0x66466d]
>  12: (()+0x8062) [0x7f8e0fe3f062]
>  13: (clone()+0x6d) [0x7f8e0ebd3a3d]
> 2014-07-09 21:01:10.201968 7f8e07c21700 -1 *** Caught signal (Aborted) **
>  in thread 7f8e07c21700
>
>   ceph version 0.82 (14085f42ddd0fef4e7e1dc99402d07a8df82c04e)
>  1: /usr/bin/ceph-mds() [0x8d81f2]
>  2: (()+0xf880) [0x7f8e0fe46880]
>  3: (gsignal()+0x39) [0x7f8e0eb233a9]
>  4: (abort()+0x148) [0x7f8e0eb264c8]
>  5: (__gnu_cxx::__verbose_terminate_handler()+0x155) [0x7f8e0f4105e5]
>  6: (()+0x5e746) [0x7f8e0f40e746]
>  7: (()+0x5e773) [0x7f8e0f40e773]
>  8: (()+0x5e9b2) [0x7f8e0f40e9b2]
>  9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x40a) [0x9ab5da]
>  10: (MDLog::_replay_thread()+0x197b) [0x85a3cb]
>  11: (MDLog::ReplayThread::entry()+0xd) [0x66466d]
>  12: (()+0x8062) [0x7f8e0fe3f062]
>  13: (clone()+0x6d) [0x7f8e0ebd3a3d]
>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed
> to interpret this.
>
>      0> 2014-07-09 21:01:10.201968 7f8e07c21700 -1 *** Caught signal
> (Aborted) **
>  in thread 7f8e07c21700
>
>  ceph version 0.82 (14085f42ddd0fef4e7e1dc99402d07a8df82c04e)
>  1: /usr/bin/ceph-mds() [0x8d81f2]
>  2: (()+0xf880) [0x7f8e0fe46880]
>  3: (gsignal()+0x39) [0x7f8e0eb233a9]
>  4: (abort()+0x148) [0x7f8e0eb264c8]
>  5: (__gnu_cxx::__verbose_terminate_handler()+0x155) [0x7f8e0f4105e5]
>  6: (()+0x5e746) [0x7f8e0f40e746]
>  7: (()+0x5e773) [0x7f8e0f40e773]
>  8: (()+0x5e9b2) [0x7f8e0f40e9b2]
>  9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x40a) [0x9ab5da]
>  10: (MDLog::_replay_thread()+0x197b) [0x85a3cb]
>  11: (MDLog::ReplayThread::entry()+0xd) [0x66466d]
>  12: (()+0x8062) [0x7f8e0fe3f062]
>  13: (clone()+0x6d) [0x7f8e0ebd3a3d]
>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed
> to interpret this.
>
> Aborted
> root at MDS1:/var/log/ceph# /usr/bin/ceph-mds -i MDS1 --pid-file
> /var/run/ceph/mds.MDS1.pid -c /etc/ceph/ceph.conf --cluster ceph -f
> --debug-mds=20 --debug-journaler=10
> starting mds.MDS1 at :/0
>
>
> mds/MDLog.cc: In function 'void MDLog::_replay_thread()' thread
> 7fb7f7b83700 time 2014-07-09 23:21:43.383304
>
> mds/MDLog.cc: 815: FAILED assert(journaler->is_readable())
>
>
>  ceph version 0.82 (14085f42ddd0fef4e7e1dc99402d07a8df82c04e)
>
>
>  1: (MDLog::_replay_thread()+0x197b) [0x85a3cb]
>
>
>  2: (MDLog::ReplayThread::entry()+0xd) [0x66466d]
>
>
>  3: (()+0x8062) [0x7fb7ffda1062]
>
>
>  4: (clone()+0x6d) [0x7fb7feb35a3d]
>
>
>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed
> to interpret this.
>
> 2014-07-09 23:21:43.385274 7fb7f7b83700 -1 mds/MDLog.cc: In function 'void
> MDLog::_replay_thread()' thread 7fb7f7b83700 time 2014-07-09
> 23:21:43.383304
>
> mds/MDLog.cc: 815: FAILED assert(journaler->is_readable())
>
>
>
>
>
>  ceph version 0.82 (14085f42ddd0fef4e7e1dc99402d07a8df82c04e)
>
>
>  1: (MDLog::_replay_thread()+0x197b) [0x85a3cb]
>
>
>  2: (MDLog::ReplayThread::entry()+0xd) [0x66466d]
>
>
>  3: (()+0x8062) [0x7fb7ffda1062]
>
>
>  4: (clone()+0x6d) [0x7fb7feb35a3d]
>
>
>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed
> to interpret this.
>
>
>
>
>      0> 2014-07-09 23:21:43.385274 7fb7f7b83700 -1 mds/MDLog.cc: In
> function 'void MDLog::_replay_thread()' thread 7fb7f7b83700 time 2014-07-09
> 23:21:43.383304
> mds/MDLog.cc: 815: FAILED assert(journaler->is_readable())
>
>
>
>
>
>  ceph version 0.82 (14085f42ddd0fef4e7e1dc99402d07a8df82c04e)
>
>
>  1: (MDLog::_replay_thread()+0x197b) [0x85a3cb]
>
>
>  2: (MDLog::ReplayThread::entry()+0xd) [0x66466d]
>
>
>  3: (()+0x8062) [0x7fb7ffda1062]
>
>
>  4: (clone()+0x6d) [0x7fb7feb35a3d]
>
>
>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed
> to interpret this.
>
>
>
>
> terminate called after throwing an instance of 'ceph::FailedAssertion'
>
>
> *** Caught signal (Aborted) **
>  in thread 7fb7f7b83700
>  ceph version 0.82 (14085f42ddd0fef4e7e1dc99402d07a8df82c04e)
>  1: /usr/bin/ceph-mds() [0x8d81f2]
>  2: (()+0xf880) [0x7fb7ffda8880]
>  3: (gsignal()+0x39) [0x7fb7fea853a9]
>  4: (abort()+0x148) [0x7fb7fea884c8]
>  5: (__gnu_cxx::__verbose_terminate_handler()+0x155) [0x7fb7ff3725e5]
>  6: (()+0x5e746) [0x7fb7ff370746]
>  7: (()+0x5e773) [0x7fb7ff370773]
>  8: (()+0x5e9b2) [0x7fb7ff3709b2]
>  9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x40a) [0x9ab5da]
>  10: (MDLog::_replay_thread()+0x197b) [0x85a3cb]
>  11: (MDLog::ReplayThread::entry()+0xd) [0x66466d]
>  12: (()+0x8062) [0x7fb7ffda1062]
>  13: (clone()+0x6d) [0x7fb7feb35a3d]
> 2014-07-09 23:21:43.394324 7fb7f7b83700 -1 *** Caught signal (Aborted) **
>  in thread 7fb7f7b83700
>
>  ceph version 0.82 (14085f42ddd0fef4e7e1dc99402d07a8df82c04e)
>  1: /usr/bin/ceph-mds() [0x8d81f2]
>  2: (()+0xf880) [0x7fb7ffda8880]
>  3: (gsignal()+0x39) [0x7fb7fea853a9]
>  4: (abort()+0x148) [0x7fb7fea884c8]
>  5: (__gnu_cxx::__verbose_terminate_handler()+0x155) [0x7fb7ff3725e5]
>  6: (()+0x5e746) [0x7fb7ff370746]
>  7: (()+0x5e773) [0x7fb7ff370773]
>  8: (()+0x5e9b2) [0x7fb7ff3709b2]
>  9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x40a) [0x9ab5da]
>  10: (MDLog::_replay_thread()+0x197b) [0x85a3cb]
>  11: (MDLog::ReplayThread::entry()+0xd) [0x66466d]
>  12: (()+0x8062) [0x7fb7ffda1062]
>  13: (clone()+0x6d) [0x7fb7feb35a3d]
>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed
> to interpret this.
>
>      0> 2014-07-09 23:21:43.394324 7fb7f7b83700 -1 *** Caught signal
> (Aborted) **
>  in thread 7fb7f7b83700
>
>  ceph version 0.82 (14085f42ddd0fef4e7e1dc99402d07a8df82c04e)
>  1: /usr/bin/ceph-mds() [0x8d81f2]
>  2: (()+0xf880) [0x7fb7ffda8880]
>  3: (gsignal()+0x39) [0x7fb7fea853a9]
>  4: (abort()+0x148) [0x7fb7fea884c8]
>  5: (__gnu_cxx::__verbose_terminate_handler()+0x155) [0x7fb7ff3725e5]
>  6: (()+0x5e746) [0x7fb7ff370746]
>  7: (()+0x5e773) [0x7fb7ff370773]
>  8: (()+0x5e9b2) [0x7fb7ff3709b2]
>  9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x40a) [0x9ab5da]
>  10: (MDLog::_replay_thread()+0x197b) [0x85a3cb]
>  11: (MDLog::ReplayThread::entry()+0xd) [0x66466d]
>  12: (()+0x8062) [0x7fb7ffda1062]
>  13: (clone()+0x6d) [0x7fb7feb35a3d]
>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed
> to interpret this.
>
> Aborted
>
> Felt like OSD1 was trashed so I removed osd.0 osd.1 osd.2.
>
> Still seeing below, and can't get MDS up.
>
> HEALTH_ERR 154 pgs degraded; 38 pgs inconsistent; 192 pgs stuck unclean;
> recovery 1374024/3513098 objects degraded (39.111%); 1374 scrub errors; mds
> cluster is degraded; mds MDS1 is laggy
>
> Is there something I can try to bring this file system up again? =P  I
> would like to access some of those data again.  Let me know if you need any
> additional info.  I was running Debian kernel 3.13.1 for first part, then
> 3.14.1 when I upgraded ceph to 0.82.
>
> Regards,
> Hong
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users at lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140717/0f39474e/attachment.htm>


[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux