Re: Upgrade to 0.80.7-0.el6 from 0.80.1-0.el6, OSD crashes on startup

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



[root@ceph-node20 ~]# ls /var/lib/ceph/osd/us-west01-0/current

0.10_head  0.1a_head  0.23_head  0.2c_head  0.37_head  0.3_head  0.b_head  1.10_head  1.1b_head  1.24_head  1.2c_head  1.3a_head  1.b_head   2.16_head  2.1_head   2.2a_head  2.32_head  2.3a_head  2.a_head       omap

0.11_head  0.1d_head  0.25_head  0.2e_head  0.38_head  0.4_head  0.c_head  1.13_head  1.1d_head  1.26_head  1.2f_head  1.3b_head  1.e_head   2.1a_head  2.22_head  2.2c_head  2.33_head  2.3e_head  2.b_head

0.13_head  0.1f_head  0.26_head  0.2f_head  0.3b_head  0.5_head  0.d_head  1.16_head  1.1f_head  1.27_head  1.31_head  1.3e_head  2.0_head   2.1b_head  2.25_head  2.2e_head  2.36_head  2.3f_head  2.c_head

0.16_head  0.20_head  0.27_head  0.30_head  0.3c_head  0.6_head  0.e_head  1.18_head  1.20_head  1.29_head  1.36_head  1.3_head   2.10_head  2.1c_head  2.26_head  2.2f_head  2.37_head  2.4_head   commit_op_seq

0.18_head  0.21_head  0.28_head  0.33_head  0.3e_head  0.7_head  0.f_head  1.19_head  1.22_head  1.2a_head  1.37_head  1.4_head   2.11_head  2.1d_head  2.27_head  2.30_head  2.38_head  2.7_head   meta

0.19_head  0.22_head  0.29_head  0.35_head  0.3f_head  0.9_head  1.0_head  1.1a_head  1.23_head  1.2b_head  1.39_head  1.a_head   2.12_head  2.1e_head  2.28_head  2.31_head  2.39_head  2.8_head   nosnap


The output from the other command was too long to post, here's the link to the full dump:

http://pastee.co/Kd1BlP

Here's the last 100-200 lines:

...

...

...

...

_HOBJTOSEQ_:pglog%u2%e2e...0.none.516B9E4C

_HOBJTOSEQ_:pglog%u2%e2f...0.none.516B9F1C

_HOBJTOSEQ_:pglog%u2%e30...0.none.516BFD4B

_HOBJTOSEQ_:pglog%u2%e31...0.none.516BF21B

_HOBJTOSEQ_:pglog%u2%e32...0.none.516BF3AB

_HOBJTOSEQ_:pglog%u2%e33...0.none.516BF37B

_HOBJTOSEQ_:pglog%u2%e36...0.none.516BF16B

_HOBJTOSEQ_:pglog%u2%e37...0.none.516BF63B

_HOBJTOSEQ_:pglog%u2%e38...0.none.516BF7CB

_HOBJTOSEQ_:pglog%u2%e39...0.none.516BF49B

_HOBJTOSEQ_:pglog%u2%e3a...0.none.516B933C

_HOBJTOSEQ_:pglog%u2%e3e...0.none.516B96FC

_HOBJTOSEQ_:pglog%u2%e3f...0.none.516B978C

_HOBJTOSEQ_:pglog%u2%e4...0.none.103ABD8E

_HOBJTOSEQ_:pglog%u2%e7...0.none.103AB3BE

_HOBJTOSEQ_:pglog%u2%e8...0.none.103AB34E

_HOBJTOSEQ_:pglog%u2%ea...0.none.103A5CBF

_HOBJTOSEQ_:pglog%u2%eb...0.none.103A5C4F

_HOBJTOSEQ_:pglog%u2%ec...0.none.103A5D1F

_SYS_:HEADER

*** Caught signal (Bus error) **

 in thread 7f64e92ce760

 ceph version 0.80.7 (6c0127fcb58008793d3c8b62d925bc91963672a3)

 1: ceph-kvstore-tool() [0x4bf2e1]

 2: (()+0xf710) [0x7f64e86e0710]

 3: (leveldb::ReadBlock(leveldb::RandomAccessFile*, leveldb::ReadOptions const&, leveldb::BlockHandle const&, leveldb::BlockContents*)+0x1cb) [0x7f64e8e9f73b]

 4: (leveldb::Table::BlockReader(void*, leveldb::ReadOptions const&, leveldb::Slice const&)+0x291) [0x7f64e8ea0de1]

 5: (()+0x3a412) [0x7f64e8ea3412]

 6: (()+0x3a6f8) [0x7f64e8ea36f8]

 7: (()+0x3a78d) [0x7f64e8ea378d]

 8: (()+0x3761a) [0x7f64e8ea061a]

 9: (()+0x20fd2) [0x7f64e8e89fd2]

 10: (LevelDBStore::LevelDBWholeSpaceIteratorImpl::next()+0x47) [0x4ba417]

 11: (StoreTool::traverse(std::string const&, bool, std::ostream*)+0x1da) [0x4b65fa]

 12: (main()+0x2cc) [0x4b26fc]

 13: (__libc_start_main()+0xfd) [0x7f64e77a2d1d]

 14: ceph-kvstore-tool() [0x4b21b9]

2014-11-13 21:19:18.318941 7f64e92ce760 -1 *** Caught signal (Bus error) **

 in thread 7f64e92ce760


 ceph version 0.80.7 (6c0127fcb58008793d3c8b62d925bc91963672a3)

 1: ceph-kvstore-tool() [0x4bf2e1]

 2: (()+0xf710) [0x7f64e86e0710]

 3: (leveldb::ReadBlock(leveldb::RandomAccessFile*, leveldb::ReadOptions const&, leveldb::BlockHandle const&, leveldb::BlockContents*)+0x1cb) [0x7f64e8e9f73b]

 4: (leveldb::Table::BlockReader(void*, leveldb::ReadOptions const&, leveldb::Slice const&)+0x291) [0x7f64e8ea0de1]

 5: (()+0x3a412) [0x7f64e8ea3412]

 6: (()+0x3a6f8) [0x7f64e8ea36f8]

 7: (()+0x3a78d) [0x7f64e8ea378d]

 8: (()+0x3761a) [0x7f64e8ea061a]

 9: (()+0x20fd2) [0x7f64e8e89fd2]

 10: (LevelDBStore::LevelDBWholeSpaceIteratorImpl::next()+0x47) [0x4ba417]

 11: (StoreTool::traverse(std::string const&, bool, std::ostream*)+0x1da) [0x4b65fa]

 12: (main()+0x2cc) [0x4b26fc]

 13: (__libc_start_main()+0xfd) [0x7f64e77a2d1d]

 14: ceph-kvstore-tool() [0x4b21b9]

 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.


--- begin dump of recent events ---

   -13> 2014-11-13 21:19:04.689722 7f64e92ce760  5 asok(0x1f1b5b0) register_command perfcounters_dump hook 0x1f1b510

   -12> 2014-11-13 21:19:04.689754 7f64e92ce760  5 asok(0x1f1b5b0) register_command 1 hook 0x1f1b510

   -11> 2014-11-13 21:19:04.689771 7f64e92ce760  5 asok(0x1f1b5b0) register_command perf dump hook 0x1f1b510

   -10> 2014-11-13 21:19:04.689778 7f64e92ce760  5 asok(0x1f1b5b0) register_command perfcounters_schema hook 0x1f1b510

    -9> 2014-11-13 21:19:04.689787 7f64e92ce760  5 asok(0x1f1b5b0) register_command 2 hook 0x1f1b510

    -8> 2014-11-13 21:19:04.689793 7f64e92ce760  5 asok(0x1f1b5b0) register_command perf schema hook 0x1f1b510

    -7> 2014-11-13 21:19:04.689803 7f64e92ce760  5 asok(0x1f1b5b0) register_command config show hook 0x1f1b510

    -6> 2014-11-13 21:19:04.689811 7f64e92ce760  5 asok(0x1f1b5b0) register_command config set hook 0x1f1b510

    -5> 2014-11-13 21:19:04.689818 7f64e92ce760  5 asok(0x1f1b5b0) register_command config get hook 0x1f1b510

    -4> 2014-11-13 21:19:04.689821 7f64e92ce760  5 asok(0x1f1b5b0) register_command log flush hook 0x1f1b510

    -3> 2014-11-13 21:19:04.689831 7f64e92ce760  5 asok(0x1f1b5b0) register_command log dump hook 0x1f1b510

    -2> 2014-11-13 21:19:04.689837 7f64e92ce760  5 asok(0x1f1b5b0) register_command log reopen hook 0x1f1b510

    -1> 2014-11-13 21:19:04.689940 7f64e92ce760 -1 did not load config file, using default settings.

     0> 2014-11-13 21:19:18.318941 7f64e92ce760 -1 *** Caught signal (Bus error) **

 in thread 7f64e92ce760


 ceph version 0.80.7 (6c0127fcb58008793d3c8b62d925bc91963672a3)

 1: ceph-kvstore-tool() [0x4bf2e1]

 2: (()+0xf710) [0x7f64e86e0710]

 3: (leveldb::ReadBlock(leveldb::RandomAccessFile*, leveldb::ReadOptions const&, leveldb::BlockHandle const&, leveldb::BlockContents*)+0x1cb) [0x7f64e8e9f73b]

 4: (leveldb::Table::BlockReader(void*, leveldb::ReadOptions const&, leveldb::Slice const&)+0x291) [0x7f64e8ea0de1]

 5: (()+0x3a412) [0x7f64e8ea3412]

 6: (()+0x3a6f8) [0x7f64e8ea36f8]

 7: (()+0x3a78d) [0x7f64e8ea378d]

 8: (()+0x3761a) [0x7f64e8ea061a]

 9: (()+0x20fd2) [0x7f64e8e89fd2]

 10: (LevelDBStore::LevelDBWholeSpaceIteratorImpl::next()+0x47) [0x4ba417]

 11: (StoreTool::traverse(std::string const&, bool, std::ostream*)+0x1da) [0x4b65fa]

 12: (main()+0x2cc) [0x4b26fc]

 13: (__libc_start_main()+0xfd) [0x7f64e77a2d1d]

 14: ceph-kvstore-tool() [0x4b21b9]

 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.


--- logging levels ---

   0/ 5 none

   0/ 1 lockdep

   0/ 1 context

   1/ 1 crush

   1/ 5 mds

   1/ 5 mds_balancer

   1/ 5 mds_locker

   1/ 5 mds_log

   1/ 5 mds_log_expire

   1/ 5 mds_migrator

   0/ 1 buffer

   0/ 1 timer

   0/ 1 filer

   0/ 1 striper

   0/ 1 objecter

   0/ 5 rados

   0/ 5 rbd

   0/ 5 journaler

   0/ 5 objectcacher

   0/ 5 client

   0/ 5 osd

   0/ 5 optracker

   0/ 5 objclass

   1/ 3 filestore

   1/ 3 keyvaluestore

   1/ 3 journal

   0/ 5 ms

   1/ 5 mon

   0/10 monc

   1/ 5 paxos

   0/ 5 tp

   1/ 5 auth

   1/ 5 crypto

   1/ 1 finisher

   1/ 5 heartbeatmap

   1/ 5 perfcounter

   1/ 5 rgw

   1/ 5 javaclient

   1/ 5 asok

   1/ 1 throttle

  -2/-2 (syslog threshold)

  99/99 (stderr threshold)

  max_recent       500

  max_new         1000

  log_file 

--- end dump of recent events ---

Bus error



Joshua



On Thu, Nov 13, 2014 at 8:52 PM, Sage Weil <sage@xxxxxxxxxxxx> wrote:
On Thu, 13 Nov 2014, Joshua McClintock wrote:
> I upgraded my mons to the latest version and they appear to work, I then
> upgraded my mds and it seems fine.  
> I then upgraded one OSD node and the OSD fails to start with the following
> dump, any help is appreciated:
>
> --- begin dump of recent events ---
>
>      0> 2014-11-13 18:20:15.625793 7fbd973ce7a0 -1 *** Caught signal
> (Aborted) **
>
>  in thread 7fbd973ce7a0
>
>
>  ceph version 0.80.7 (6c0127fcb58008793d3c8b62d925bc91963672a3)
>  1: /usr/bin/ceph-osd() [0x9bd2a1]
>  2: (()+0xf710) [0x7fbd96373710]
>  3: (gsignal()+0x35) [0x7fbd95245925]
>  4: (abort()+0x175) [0x7fbd95247105]
>  5: (__gnu_cxx::__verbose_terminate_handler()+0x12d) [0x7fbd95affa5d]
>  6: (()+0xbcbe6) [0x7fbd95afdbe6]
>  7: (()+0xbcc13) [0x7fbd95afdc13]
>  8: (()+0xbcd0e) [0x7fbd95afdd0e]
>  9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x7f2) [0xafbe22]
>  10: (PG::peek_map_epoch(ObjectStore*, coll_t, hobject_t&,
> ceph::buffer::list*)+0x4ea) [0x7f729a]
>  11: (OSD::load_pgs()+0x18f1) [0x64f2b1]
>  12: (OSD::init()+0x22c0) [0x6536f0]
>  13: (main()+0x35bc) [0x5fe39c]
>  14: (__libc_start_main()+0xfd) [0x7fbd95231d1d]
>  15: /usr/bin/ceph-osd() [0x5f9e49]
>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to
> interpret this.

Hey, this looks like a different report we saw recently off-list!  In that
case, they were upgrading from 0.80.4 to 0.80.7.  Opening #10105.

You only restarting a single OSD?  I would hold off on restarting any more
for the time being.

Can you attach the output from

        ls /var/lib/ceph/osd/ceph-NNN/current
and
        ceph-kvstore-tool /var/lib/ceph/osd/ceph-NNN/current/omap list

(you may need to install the ceph-tests rpm to get ceph-kvstore-tool).

Thanks!
sage


>
>
> --- logging levels ---
>
>    0/ 5 none
>
>    0/ 1 lockdep
>
>    0/ 1 context
>
>    1/ 1 crush
>
>    1/ 5 mds
>
>    1/ 5 mds_balancer
>
>    1/ 5 mds_locker
>
>    1/ 5 mds_log
>
>    1/ 5 mds_log_expire
>
>    1/ 5 mds_migrator
>
>    0/ 1 buffer
>
>    0/ 1 timer
>
>    0/ 1 filer
>
>    0/ 1 striper
>
>    0/ 1 objecter
>
>    0/ 5 rados
>
>    0/ 5 rbd
>
>    0/ 5 journaler
>
>    0/ 5 objectcacher
>
>    0/ 5 client
>
>    0/ 5 osd
>
>    0/ 5 optracker
>
>    0/ 5 objclass
>
>    1/ 3 filestore
>
>    1/ 3 keyvaluestore
>
>    1/ 3 journal
>
>    0/ 5 ms
>
>    1/ 5 mon
>
>    0/10 monc
>
>    1/ 5 paxos
>
>    0/ 5 tp
>
>    1/ 5 auth
>
>    1/ 5 crypto
>
>    1/ 1 finisher
>
>    1/ 5 heartbeatmap
>
>    1/ 5 perfcounter
>
>    1/ 5 rgw
>
>    1/ 5 javaclient
>
>    1/ 5 asok
>
>    1/ 1 throttle
>
>   -2/-2 (syslog threshold)
>
>   -1/-1 (stderr threshold)
>
>   max_recent     10000
>
>   max_new         1000
>
>   log_file /var/log/ceph/us-west01-osd.0.log
>
> --- end dump of recent events ---
>
>
>

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux