Re: Flapping/Crashing OSD

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Thanks Gregory.

Currently it's just the one OSD with the issue. If it's more of a general failing of an OSD I'll rip it out and replace the drive.

-Michael

On 20/02/2014 17:55, Gregory Farnum wrote:
On Thu, Feb 20, 2014 at 4:26 AM, Michael <michael@xxxxxxxxxxxxxxxxxx> wrote:
Hi All,

Have a log full of -

"log [ERR] : 1.9 log bound mismatch, info (46784'1236417,46797'1239418]
actual [46784'1235968,46797'1239418]"
Do you mean that error message is showing up for a lot of different
PGs? The specific error indicates that the PG log doesn't look quite
as expected, but in this case it's got more entries than it should,
which should be recoverable.
If that's the case for a lot of PGs, though, it sounds like maybe
there was an issue with LevelDB and it resurrected a lot of deleted
data which has left the store in an inconsistent state.

The particular assert you're hitting supports that; an iterator is
becoming invalid when it shouldn't be.

If the other OSDs are fine, I'd mark this OSD down and out, reformat
the drive, and let the cluster recover.
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com

"192.168.7.177:6800/15655 >> 192.168.7.183:6802/3348 pipe(0x20e4f00 sd=65
:56394 s=2 pgs=24194 cs=1 l=0 c=0x19668f20).fault, initiating reconnect"

and an OSD that showed as down, started it up and data synced as expected
but then the osd started crashing and rebooting on cycle.

Log can be obtained from http://onlinefusion.co.uk/info/ceph-osd.4.zip
(Trimmed the repeating parts so it's 160KB), snippet below.
Any ideas what's wrong with it?

--------------

-1> 2014-02-20 11:54:55.703196 7fbca1278700  0 log [ERR] : 1.9 log bound
mismatch, info (46784'1236417,46797'1239418] actual
[46784'1235968,46797'1239418]
      0> 2014-02-20 11:55:05.243723 7fbc9f274700 -1 os/DBObjectMap.cc: In
function 'virtual bool DBObjectMap::DBObjectMapIteratorImpl::valid()' thread
7fbc9f274700 time 2014-02-20 11:55:05.240689
os/DBObjectMap.cc: 400: FAILED assert(!valid || cur_iter->valid())

  ceph version 0.72.2 (a913ded2ff138aefb8cb84d347d72164099cfd60)
  1: /usr/bin/ceph-osd() [0x95b762]
  2: (PG::_scan_list(ScrubMap&, std::vector<hobject_t,
std::allocator<hobject_t> >&, bool, ThreadPool::TPHandle&)+0xed7) [0x86e3b7]
  3: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool,
ThreadPool::TPHandle&)+0x106) [0x871256]
  4: (PG::replica_scrub(MOSDRepScrub*, ThreadPool::TPHandle&)+0x88e)
[0x871f1e]
  5: (OSD::RepScrubWQ::_process(MOSDRepScrub*, ThreadPool::TPHandle&)+0xbd)
[0x740b8d]
  6: (ThreadPool::worker(ThreadPool::WorkThread*)+0x68c) [0xa33d9c]
  7: (ThreadPool::WorkThread::entry()+0x10) [0xa34ff0]
  8: (()+0x7f8e) [0x7fbcbe3c6f8e]
  9: (clone()+0x6d) [0x7fbcbc8e5a0d]
  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to
interpret this.

--- logging levels ---
    0/ 5 none
    0/ 0 lockdep
    0/ 0 context
    0/ 0 crush
    1/ 5 mds
    1/ 5 mds_balancer
    1/ 5 mds_locker
    1/ 5 mds_log
    1/ 5 mds_log_expire
    1/ 5 mds_migrator
    0/ 0 buffer
    0/ 0 timer
    0/ 1 filer
    0/ 1 striper
    0/ 1 objecter
    0/ 5 rados
    0/ 5 rbd
    0/ 0 journaler
    0/ 5 objectcacher
    0/ 5 client
    0/ 0 osd
    0/ 0 optracker
    0/ 0 objclass
    0/ 0 filestore
    0/ 0 journal
    0/ 0 ms
    1/ 5 mon
    0/ 0 monc
    1/ 5 paxos
    0/ 0 tp
    0/ 0 auth
    1/ 5 crypto
    0/ 0 finisher
    0/ 0 heartbeatmap
    0/ 0 perfcounter
    1/ 5 rgw
    1/ 5 javaclient
    0/ 0 asok
    0/ 0 throttle
   -2/-2 (syslog threshold)
   -1/-1 (stderr threshold)
   max_recent     10000
   max_new         1000
   log_file /var/log/ceph/ceph-osd.4.log
--- end dump of recent events ---
2014-02-20 11:55:05.343940 7fbc9f274700 -1 *** Caught signal (Aborted) **
  in thread 7fbc9f274700

  ceph version 0.72.2 (a913ded2ff138aefb8cb84d347d72164099cfd60)
  1: /usr/bin/ceph-osd() [0x97dc70]
  2: (()+0xfbd0) [0x7fbcbe3cebd0]
  3: (gsignal()+0x37) [0x7fbcbc822037]
  4: (abort()+0x148) [0x7fbcbc825698]
  5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7fbcbd12fe8d]
  6: (()+0x5ef76) [0x7fbcbd12df76]
  7: (()+0x5efa3) [0x7fbcbd12dfa3]
  8: (()+0x5f1de) [0x7fbcbd12e1de]
  9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x43d) [0xa3f64d]
  10: /usr/bin/ceph-osd() [0x95b762]
  11: (PG::_scan_list(ScrubMap&, std::vector<hobject_t,
std::allocator<hobject_t> >&, bool, ThreadPool::TPHandle&)+0xed7) [0x86e3b7]
  12: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool,
ThreadPool::TPHandle&)+0x106) [0x871256]
  13: (PG::replica_scrub(MOSDRepScrub*, ThreadPool::TPHandle&)+0x88e)
[0x871f1e]
  14: (OSD::RepScrubWQ::_process(MOSDRepScrub*, ThreadPool::TPHandle&)+0xbd)
[0x740b8d]
  15: (ThreadPool::worker(ThreadPool::WorkThread*)+0x68c) [0xa33d9c]
  16: (ThreadPool::WorkThread::entry()+0x10) [0xa34ff0]
  17: (()+0x7f8e) [0x7fbcbe3c6f8e]
  18: (clone()+0x6d) [0x7fbcbc8e5a0d]
  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to
interpret this.

--- begin dump of recent events ---
      0> 2014-02-20 11:55:05.343940 7fbc9f274700 -1 *** Caught signal
(Aborted) **
  in thread 7fbc9f274700

  ceph version 0.72.2 (a913ded2ff138aefb8cb84d347d72164099cfd60)
  1: /usr/bin/ceph-osd() [0x97dc70]
  2: (()+0xfbd0) [0x7fbcbe3cebd0]
  3: (gsignal()+0x37) [0x7fbcbc822037]
  4: (abort()+0x148) [0x7fbcbc825698]
  5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7fbcbd12fe8d]
  6: (()+0x5ef76) [0x7fbcbd12df76]
  7: (()+0x5efa3) [0x7fbcbd12dfa3]
  8: (()+0x5f1de) [0x7fbcbd12e1de]
  9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x43d) [0xa3f64d]
  10: /usr/bin/ceph-osd() [0x95b762]
  11: (PG::_scan_list(ScrubMap&, std::vector<hobject_t,
std::allocator<hobject_t> >&, bool, ThreadPool::TPHandle&)+0xed7) [0x86e3b7]
  12: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool,
ThreadPool::TPHandle&)+0x106) [0x871256]
  13: (PG::replica_scrub(MOSDRepScrub*, ThreadPool::TPHandle&)+0x88e)
[0x871f1e]
  14: (OSD::RepScrubWQ::_process(MOSDRepScrub*, ThreadPool::TPHandle&)+0xbd)
[0x740b8d]
  15: (ThreadPool::worker(ThreadPool::WorkThread*)+0x68c) [0xa33d9c]
  16: (ThreadPool::WorkThread::entry()+0x10) [0xa34ff0]
  17: (()+0x7f8e) [0x7fbcbe3c6f8e]
  18: (clone()+0x6d) [0x7fbcbc8e5a0d]
  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to
interpret this.

--------------
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux