Re: Flapping/Crashing OSD

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Feb 20, 2014 at 4:26 AM, Michael <michael@xxxxxxxxxxxxxxxxxx> wrote:
> Hi All,
>
> Have a log full of -
>
> "log [ERR] : 1.9 log bound mismatch, info (46784'1236417,46797'1239418]
> actual [46784'1235968,46797'1239418]"

Do you mean that error message is showing up for a lot of different
PGs? The specific error indicates that the PG log doesn't look quite
as expected, but in this case it's got more entries than it should,
which should be recoverable.
If that's the case for a lot of PGs, though, it sounds like maybe
there was an issue with LevelDB and it resurrected a lot of deleted
data which has left the store in an inconsistent state.

The particular assert you're hitting supports that; an iterator is
becoming invalid when it shouldn't be.

If the other OSDs are fine, I'd mark this OSD down and out, reformat
the drive, and let the cluster recover.
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com

>
> "192.168.7.177:6800/15655 >> 192.168.7.183:6802/3348 pipe(0x20e4f00 sd=65
> :56394 s=2 pgs=24194 cs=1 l=0 c=0x19668f20).fault, initiating reconnect"
>
> and an OSD that showed as down, started it up and data synced as expected
> but then the osd started crashing and rebooting on cycle.
>
> Log can be obtained from http://onlinefusion.co.uk/info/ceph-osd.4.zip
> (Trimmed the repeating parts so it's 160KB), snippet below.
> Any ideas what's wrong with it?
>
> --------------
>
> -1> 2014-02-20 11:54:55.703196 7fbca1278700  0 log [ERR] : 1.9 log bound
> mismatch, info (46784'1236417,46797'1239418] actual
> [46784'1235968,46797'1239418]
>      0> 2014-02-20 11:55:05.243723 7fbc9f274700 -1 os/DBObjectMap.cc: In
> function 'virtual bool DBObjectMap::DBObjectMapIteratorImpl::valid()' thread
> 7fbc9f274700 time 2014-02-20 11:55:05.240689
> os/DBObjectMap.cc: 400: FAILED assert(!valid || cur_iter->valid())
>
>  ceph version 0.72.2 (a913ded2ff138aefb8cb84d347d72164099cfd60)
>  1: /usr/bin/ceph-osd() [0x95b762]
>  2: (PG::_scan_list(ScrubMap&, std::vector<hobject_t,
> std::allocator<hobject_t> >&, bool, ThreadPool::TPHandle&)+0xed7) [0x86e3b7]
>  3: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool,
> ThreadPool::TPHandle&)+0x106) [0x871256]
>  4: (PG::replica_scrub(MOSDRepScrub*, ThreadPool::TPHandle&)+0x88e)
> [0x871f1e]
>  5: (OSD::RepScrubWQ::_process(MOSDRepScrub*, ThreadPool::TPHandle&)+0xbd)
> [0x740b8d]
>  6: (ThreadPool::worker(ThreadPool::WorkThread*)+0x68c) [0xa33d9c]
>  7: (ThreadPool::WorkThread::entry()+0x10) [0xa34ff0]
>  8: (()+0x7f8e) [0x7fbcbe3c6f8e]
>  9: (clone()+0x6d) [0x7fbcbc8e5a0d]
>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to
> interpret this.
>
> --- logging levels ---
>    0/ 5 none
>    0/ 0 lockdep
>    0/ 0 context
>    0/ 0 crush
>    1/ 5 mds
>    1/ 5 mds_balancer
>    1/ 5 mds_locker
>    1/ 5 mds_log
>    1/ 5 mds_log_expire
>    1/ 5 mds_migrator
>    0/ 0 buffer
>    0/ 0 timer
>    0/ 1 filer
>    0/ 1 striper
>    0/ 1 objecter
>    0/ 5 rados
>    0/ 5 rbd
>    0/ 0 journaler
>    0/ 5 objectcacher
>    0/ 5 client
>    0/ 0 osd
>    0/ 0 optracker
>    0/ 0 objclass
>    0/ 0 filestore
>    0/ 0 journal
>    0/ 0 ms
>    1/ 5 mon
>    0/ 0 monc
>    1/ 5 paxos
>    0/ 0 tp
>    0/ 0 auth
>    1/ 5 crypto
>    0/ 0 finisher
>    0/ 0 heartbeatmap
>    0/ 0 perfcounter
>    1/ 5 rgw
>    1/ 5 javaclient
>    0/ 0 asok
>    0/ 0 throttle
>   -2/-2 (syslog threshold)
>   -1/-1 (stderr threshold)
>   max_recent     10000
>   max_new         1000
>   log_file /var/log/ceph/ceph-osd.4.log
> --- end dump of recent events ---
> 2014-02-20 11:55:05.343940 7fbc9f274700 -1 *** Caught signal (Aborted) **
>  in thread 7fbc9f274700
>
>  ceph version 0.72.2 (a913ded2ff138aefb8cb84d347d72164099cfd60)
>  1: /usr/bin/ceph-osd() [0x97dc70]
>  2: (()+0xfbd0) [0x7fbcbe3cebd0]
>  3: (gsignal()+0x37) [0x7fbcbc822037]
>  4: (abort()+0x148) [0x7fbcbc825698]
>  5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7fbcbd12fe8d]
>  6: (()+0x5ef76) [0x7fbcbd12df76]
>  7: (()+0x5efa3) [0x7fbcbd12dfa3]
>  8: (()+0x5f1de) [0x7fbcbd12e1de]
>  9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x43d) [0xa3f64d]
>  10: /usr/bin/ceph-osd() [0x95b762]
>  11: (PG::_scan_list(ScrubMap&, std::vector<hobject_t,
> std::allocator<hobject_t> >&, bool, ThreadPool::TPHandle&)+0xed7) [0x86e3b7]
>  12: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool,
> ThreadPool::TPHandle&)+0x106) [0x871256]
>  13: (PG::replica_scrub(MOSDRepScrub*, ThreadPool::TPHandle&)+0x88e)
> [0x871f1e]
>  14: (OSD::RepScrubWQ::_process(MOSDRepScrub*, ThreadPool::TPHandle&)+0xbd)
> [0x740b8d]
>  15: (ThreadPool::worker(ThreadPool::WorkThread*)+0x68c) [0xa33d9c]
>  16: (ThreadPool::WorkThread::entry()+0x10) [0xa34ff0]
>  17: (()+0x7f8e) [0x7fbcbe3c6f8e]
>  18: (clone()+0x6d) [0x7fbcbc8e5a0d]
>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to
> interpret this.
>
> --- begin dump of recent events ---
>      0> 2014-02-20 11:55:05.343940 7fbc9f274700 -1 *** Caught signal
> (Aborted) **
>  in thread 7fbc9f274700
>
>  ceph version 0.72.2 (a913ded2ff138aefb8cb84d347d72164099cfd60)
>  1: /usr/bin/ceph-osd() [0x97dc70]
>  2: (()+0xfbd0) [0x7fbcbe3cebd0]
>  3: (gsignal()+0x37) [0x7fbcbc822037]
>  4: (abort()+0x148) [0x7fbcbc825698]
>  5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7fbcbd12fe8d]
>  6: (()+0x5ef76) [0x7fbcbd12df76]
>  7: (()+0x5efa3) [0x7fbcbd12dfa3]
>  8: (()+0x5f1de) [0x7fbcbd12e1de]
>  9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x43d) [0xa3f64d]
>  10: /usr/bin/ceph-osd() [0x95b762]
>  11: (PG::_scan_list(ScrubMap&, std::vector<hobject_t,
> std::allocator<hobject_t> >&, bool, ThreadPool::TPHandle&)+0xed7) [0x86e3b7]
>  12: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool,
> ThreadPool::TPHandle&)+0x106) [0x871256]
>  13: (PG::replica_scrub(MOSDRepScrub*, ThreadPool::TPHandle&)+0x88e)
> [0x871f1e]
>  14: (OSD::RepScrubWQ::_process(MOSDRepScrub*, ThreadPool::TPHandle&)+0xbd)
> [0x740b8d]
>  15: (ThreadPool::worker(ThreadPool::WorkThread*)+0x68c) [0xa33d9c]
>  16: (ThreadPool::WorkThread::entry()+0x10) [0xa34ff0]
>  17: (()+0x7f8e) [0x7fbcbe3c6f8e]
>  18: (clone()+0x6d) [0x7fbcbc8e5a0d]
>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to
> interpret this.
>
> --------------
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux